Reputation: 823
Assuming that I have the following Spark DataFrame df
:
+-----+-------+-------+-------+
| id | col1 | col2 | col3 |
+-----+-------+-------+-------+
| "a" | 10 | 5 | 75 |
| "b" | 20 | 3 | 3 |
| "c" | 30 | 2 | 65 |
+-----+-------+-------+-------+
I want to create a new dataframe new_df
that contains:
1) the id
of each row
2) the value of the division between col1 / col2
and
3) the value of the division between col3 / col1
The desired output for new_df
is:
+-----+-------+-------+
| id | col1_2| col3_1|
+-----+-------+-------+
| "a" | 2 | 7.5 |
| "b" | 6.67 | 0.15 |
| "c" | 15 | 2.17 |
+-----+-------+-------+
I have already tried
new_df = df.select("id").withColumn("col1_2", df["col1"] / df["col2"))
without any luck
Upvotes: 0
Views: 90
Reputation: 214957
Either use select
:
df.select('id',
(df.col1 / df.col2).alias('col1_2'),
(df.col3 / df.col1).alias('col3_1')
).show()
+---+-----------------+------------------+
| id| col1_2| col3_1|
+---+-----------------+------------------+
| a| 2.0| 7.5|
| b|6.666666666666667| 0.15|
| c| 15.0|2.1666666666666665|
+---+-----------------+------------------+
Or selectExpr
:
df.selectExpr('id', 'col1 / col2 as col1_2', 'col3 / col1 as col3_1').show()
+---+-----------------+------------------+
| id| col1_2| col3_1|
+---+-----------------+------------------+
| a| 2.0| 7.5|
| b|6.666666666666667| 0.15|
| c| 15.0|2.1666666666666665|
+---+-----------------+------------------+
Upvotes: 2