Anil
Anil

Reputation: 33

Pyspark DataFrame Filtering

I have a dataframe as follows:

|Property ID|Location|Price|Bedrooms|Bathrooms|Size|Price SQ Ft|Status|

When I am filtering it with bedrooms or bathrooms it is giving correct answer

df = spark.read.csv('/FileStore/tables/realestate.txt', header=True, inferSchema=True, sep='|')
df.filter(df.Bedrooms==2).show()

But when I am filtering it with Property ID as df.filter(df.Property ID==1532201).show() , I am getting an error. Is it because there is a space in betweeen Property and ID ?

Upvotes: 0

Views: 357

Answers (2)

mck
mck

Reputation: 42422

You can also use the square bracket notation to select the column:

df.filter(df['Property ID'] == 1532201).show()

Or use a raw SQL string to filter: (note the backticks)

df.filter('`Property ID` = 1532201').show()

Upvotes: 1

Anand Vidvat
Anand Vidvat

Reputation: 1058

the space between Property and ID is the cause of issue. Another approach you can follow is as follows :

from pyspark.sql import functions as F
df.filter(F.col('Property ID')==1532201).show()

Upvotes: 2

Related Questions