Reputation: 35
This might be a very basic question as I am beginner to pyspark. I have read a csv file and trying to apply some pyspark functions like filter, split or replace on it. but i am facing an error This is my code...
emp_data = spark\
.read\
.format('csv')\
.option("inferSchema","true")\
.option("header","true")\
.load("/FileStore/tables/employee_earnings_report_2016-1.csv")
After reading the file I have applied filter..which is runnig fine
import pyspark.sql.functions as f
df = emp_data.filter((f.col("POSTAL") == 2148) | (f.col("POSTAL") == 2125)).show(5)
+-----------------+-----------+-----+-------+----------+-------+------+-------------------------+--------------+------+------+
| NAME| REGULAR|RETRO| OTHER| OVERTIME|INJURED|DETAIL|QUINN/EDUCATION INCENTIVE|TOTAL EARNINGS|POSTAL|Gender|
+-----------------+-----------+-----+-------+----------+-------+------+-------------------------+--------------+------+------+
| Abbasi,Sophia| $18,249.83| NA| NA| NA| NA| NA| NA| $18,249.83| 2148| M|
|Abbruzzese,Angela| $5,000.90| NA| NA| NA| NA| NA| NA| $5,000.90| 2125| M|
| Abbruzzese,Donna| $621.90| NA| NA| NA| NA| NA| NA| $621.90| 2125| M|
| Abdelrahim,Maha| $1,181.60| NA| NA| NA| NA| NA| NA| $1,181.60| 2125| M|
| Abron,Wayne E.|$103,667.51| NA|$550.00|$13,696.60| NA| NA| NA| $117,914.11| 2125| M|
+-----------------+-----------+-----+-------+----------+-------+------+-------------------------+--------------+------+------+
Next I want to apply the split function on 'NAME'column and extract the first index of the name..I am doing this way...
df_new = df.select(f.split(f.col("NAME"), ',')).show(3)
Yields the following error
AttributeError: 'NoneType' object has no attribute 'select'
How can I resolve this error ?
Upvotes: 2
Views: 20379
Reputation: 2698
The issue has occured due to
df = emp_data.filter((f.col("POSTAL") == 2148) | (f.col("POSTAL") == 2125)).show(5)
Adding the .show(5)
at the end changes the type of the object from a pyspark DataFrame to NoneType
.
Therefore when you use
df_new = df.select(f.split(f.col("NAME"), ',')).show(3)
you get the error AttributeError: 'NoneType' object has no attribute 'select'
A better way to do this would be to use:
df = emp_data.filter((f.col("POSTAL") == 2148) | (f.col("POSTAL") == 2125))
df.show(5)
You can also use display(df)
for a styled display.
Upvotes: 7