Reputation: 3
In pandas, I can successfully run the following:
def car(t)
if t in df_a:
return df_a[t]/df_b[t]
else:
return 0
But how can I do the exact same thing with spark dataframe?Many thanks!
The data is like this
df_a
a 20
b 40
c 60
df_b
a 80
b 50
e 100
The result should be 0.25 when input car(a)
Upvotes: 0
Views: 113
Reputation: 18042
First you have to join
both dataframes, then you have to filter
by the letter you want and select
the operation you need.
df_a = sc.parallelize([("a", 20), ("b", 40), ("c", 60)]).toDF(["key", "value"])
df_b = sc.parallelize([("a", 80), ("b", 50), ("e", 100)]).toDF(["key", "value"])
def car(c):
return df_a.join(df_b, on=["key"]).where(df_a["key"] == c).select((df_a["value"] / df_b["value"]).alias("ratio")).head()
car("a")
# Row(ratio=0.25)
Upvotes: 3