Reputation: 11
import pyspark.sql.functions as f
df_ssaGenderWithinTenPercent = df_ssaGender.select("name", "women", "men", "total", "gender", "gender_ratio", \
f.when((df_ssaGender.gender_ratio >.45) & (df_ssaGender.gender_ratio < .55) & (df_ssaGender.gender_ratio >= 10000)).orderBy("gender", "gender_ratio", ascending = False)
df_ssaGenderWithinTenPercent.show()
So I've previously created a data frame called df_ssaGender and am selecting those columns. I need to get data with a gender_ratio between 45% and 55%. However whenever I run it, I keep getting this syntax error and I'm pretty sure the code is right. Any ideas?
Upvotes: 1
Views: 32
Reputation: 6082
By breaking your code down, I found 2 places that you missing something
df_ssaGenderWithinTenPercent = (df_ssaGender
.select(
"name",
"women",
"men",
"total",
"gender",
"gender_ratio",
f.when(
(df_ssaGender.gender_ratio >.45) &
(df_ssaGender.gender_ratio < .55) &
(df_ssaGender.gender_ratio >= 10000) # you're also missing a retrun value here
)
) # you were missing this
.orderBy("gender", "gender_ratio", ascending = False)
)
df_ssaGenderWithinTenPercent.show()
Upvotes: 3