AttributeError: 'NoneType' object has no attribute 'select' | PySpark

Question

This might be a very basic question as I am beginner to pyspark. I have read a csv file and trying to apply some pyspark functions like filter, split or replace on it. but i am facing an error This is my code...

emp_data = spark\
            .read\
            .format('csv')\
            .option("inferSchema","true")\
            .option("header","true")\
            .load("/FileStore/tables/employee_earnings_report_2016-1.csv")

After reading the file I have applied filter..which is runnig fine

import pyspark.sql.functions as f
df = emp_data.filter((f.col("POSTAL") == 2148) | (f.col("POSTAL") == 2125)).show(5)

+-----------------+-----------+-----+-------+----------+-------+------+-------------------------+--------------+------+------+
|             NAME|    REGULAR|RETRO|  OTHER|  OVERTIME|INJURED|DETAIL|QUINN/EDUCATION INCENTIVE|TOTAL EARNINGS|POSTAL|Gender|
+-----------------+-----------+-----+-------+----------+-------+------+-------------------------+--------------+------+------+
|    Abbasi,Sophia| $18,249.83|   NA|     NA|        NA|     NA|    NA|                       NA|    $18,249.83|  2148|     M|
|Abbruzzese,Angela|  $5,000.90|   NA|     NA|        NA|     NA|    NA|                       NA|     $5,000.90|  2125|     M|
| Abbruzzese,Donna|    $621.90|   NA|     NA|        NA|     NA|    NA|                       NA|       $621.90|  2125|     M|
|  Abdelrahim,Maha|  $1,181.60|   NA|     NA|        NA|     NA|    NA|                       NA|     $1,181.60|  2125|     M|
|   Abron,Wayne E.|$103,667.51|   NA|$550.00|$13,696.60|     NA|    NA|                       NA|   $117,914.11|  2125|     M|
+-----------------+-----------+-----+-------+----------+-------+------+-------------------------+--------------+------+------+

Next I want to apply the split function on 'NAME'column and extract the first index of the name..I am doing this way...

df_new = df.select(f.split(f.col("NAME"), ',')).show(3)

Yields the following error

AttributeError: 'NoneType' object has no attribute 'select'

How can I resolve this error ?

The Singularity · Accepted Answer

The issue has occured due to

df = emp_data.filter((f.col("POSTAL") == 2148) | (f.col("POSTAL") == 2125)).show(5)

Adding the .show(5) at the end changes the type of the object from a pyspark DataFrame to NoneType.

Therefore when you use df_new = df.select(f.split(f.col("NAME"), ',')).show(3) you get the error AttributeError: 'NoneType' object has no attribute 'select'

A better way to do this would be to use:

df = emp_data.filter((f.col("POSTAL") == 2148) | (f.col("POSTAL") == 2125))
df.show(5)

You can also use display(df) for a styled display.

AttributeError: 'NoneType' object has no attribute 'select' | PySpark

Answers (1)

Related Questions

AttributeError: &#39;NoneType&#39; object has no attribute &#39;select&#39; | PySpark

Answers (1)

Related Questions

AttributeError: 'NoneType' object has no attribute 'select' | PySpark