Reputation: 3043
I want to multiply a column (say x3
) of a PySpark dataframe (say df
) with a scalar (say 0.1
). Below is an example of a dataframe that I have:
df = sqlContext.createDataFrame(
[(1, "a", 1551.0), (3, "B", 1925.0)], ("x1", "x2", "x3"))
df.show()
+---+---+----+
| x1| x2| x3|
+---+---+----+
| 1| a| 5.0|
| 3| B|21.0|
+---+---+----+
Below is what I am trying at present:
df_new = df.withColumn( "norm_x3", 0.1*F.col( "x3") )
df_new = df_new.select( [c for c in df_new.columns if c not in {'x3'}] )
The method which I am trying above gives the expected output which is:
+---+---+-------+
| x1| x2|norm_x3|
+---+---+-------+
| 1| a| 0.5|
| 3| B| 2.1|
+---+---+-------+
Is there a more elegant and short way of doing the same thing? Thanks.
Upvotes: 0
Views: 4059
Reputation: 101
The most elegant way would be simply using drop
:
df_new = df.withColumn("norm_x3", 0.1*F.col( "x3")).drop("x3")
Alternatively, you can also use withColumnRenamed
, but is less preferable because you're overloading "x3" and could cause confusion in the future:
df_new = df.withColumn("x3", 0.1*F.col( "x3")).withColumnRenamed("x3", "norm_x3")
Upvotes: 4
Reputation: 214927
Here's one way to do it in one line:
df.select([(df[c] * 0.1).alias('norm_x3') if c == 'x3' else df[c] for c in df.columns]
Or:
df.selectExpr('*', 'x3 * 0.1 as normal_x3').drop('x3')
Upvotes: 3