Today, we have a substantial amount of data, and it's not necessary that all the records are free from corruption. PySpark provides us with three modes for handling corrupted data.
Let's delve into it!
<ol>
<li>Permissive mode -
 In this approach, PySpark will assign null values to the corrupted records in while reading. This is suitable for scenarios where a few corrupted records will not hinder your ability to gain insights.
<pre><code class="lang-python"> spark.read.option("mode", "permissive").csv("testData.csv")
</code></pre>
</li>
<li>Drop Malformed Mode -
 This mode is most suitable for situations where there is a stringent requirement for data quality and no tolerance for corruption. PySpark drops the rows containing malformed records during the reading process.
<pre><code class="lang-python"> spark.read.option("mode", "dropMalformed").json("testData.json")
</code></pre>
</li>
<li>FailFast Mode -
 When we cannot afford any errors, this mode quickly identifies and rectifies the corrupted data from the beginning.
<pre><code class="lang-python"> spark.read.option("mode", "FAILFAST").parquet("testData.parquet")
</code></pre>
</li>
</ol>
By default, PySpark is configured in Permissive mode, but we have the flexibility to select the appropriate mode based on our specific requirements.
Happy handling !

Today, we have a substantial amount of data, and it's not necessary that all the records are free from corruption. PySpark provides us with three modes for handling corrupted data.

Let's delve into it!

1. Permissive mode -
    
    In this approach, PySpark will assign null values to the corrupted records in while reading. This is suitable for scenarios where a few corrupted records will not hinder your ability to gain insights.
    
    ```python
    spark.read.option("mode", "permissive").csv("testData.csv")
    ```
    
2. Drop Malformed Mode -
    
    This mode is most suitable for situations where there is a stringent requirement for data quality and no tolerance for corruption. PySpark drops the rows containing malformed records during the reading process.
    
    ```python
    spark.read.option("mode", "dropMalformed").json("testData.json")
    ```
    
3. FailFast Mode -
    
    When we cannot afford any errors, this mode quickly identifies and rectifies the corrupted data from the beginning.
    
    ```python
    spark.read.option("mode", "FAILFAST").parquet("testData.parquet")
    ```
    

By default, PySpark is configured in Permissive mode, but we have the flexibility to select the appropriate mode based on our specific requirements.

Happy handling !

Modes of handling corrupt data