jmizzle87
jmizzle87

Reputation: 11

Dataframe Won't Print

import pyspark.sql.functions as f

df_ssaGenderWithinTenPercent = df_ssaGender.select("name", "women", "men", "total", "gender", "gender_ratio", \
f.when((df_ssaGender.gender_ratio >.45) & (df_ssaGender.gender_ratio < .55) & (df_ssaGender.gender_ratio >= 10000)).orderBy("gender", "gender_ratio", ascending = False)
df_ssaGenderWithinTenPercent.show()

So I've previously created a data frame called df_ssaGender and am selecting those columns. I need to get data with a gender_ratio between 45% and 55%. However whenever I run it, I keep getting this syntax error and I'm pretty sure the code is right. Any ideas?


Upvotes: 1

Views: 32

Answers (1)

pltc
pltc

Reputation: 6082

By breaking your code down, I found 2 places that you missing something

df_ssaGenderWithinTenPercent = (df_ssaGender
  .select(
    "name",
    "women",
    "men",
    "total",
    "gender",
    "gender_ratio",
    f.when(
      (df_ssaGender.gender_ratio >.45) &
      (df_ssaGender.gender_ratio < .55) &
      (df_ssaGender.gender_ratio >= 10000) # you're also missing a retrun value here
    )
  ) # you were missing this
  .orderBy("gender", "gender_ratio", ascending = False)
)
df_ssaGenderWithinTenPercent.show()

Upvotes: 3

Related Questions