Pyspark DataFrame Filtering

Question

I have a dataframe as follows:

|Property ID|Location|Price|Bedrooms|Bathrooms|Size|Price SQ Ft|Status|

When I am filtering it with bedrooms or bathrooms it is giving correct answer

df = spark.read.csv('/FileStore/tables/realestate.txt', header=True, inferSchema=True, sep='|')
df.filter(df.Bedrooms==2).show()

But when I am filtering it with Property ID as df.filter(df.Property ID==1532201).show() , I am getting an error. Is it because there is a space in betweeen Property and ID ?

mck · Accepted Answer

You can also use the square bracket notation to select the column:

df.filter(df['Property ID'] == 1532201).show()

Or use a raw SQL string to filter: (note the backticks)

df.filter('`Property ID` = 1532201').show()

Pyspark DataFrame Filtering

Answers (2)

Related Questions