Reputation: 59
I want to create a column whose values are equal to another column's when certain conditions are met. I want the column first
to have the value of the column share
when the columns gender
, week
and type
are the same.
I have the following dataframe:
+------+----+----+-------------+-------------------+
|gender|week|type| share| units|
+------+----+----+-------------+-------------------+
| Male| 37|Polo| 0.01| 1809.0|
| Male| 37|Polo| 0.1| 2327.0|
| Male| 37|Polo| 0.15| 2982.0|
| Male| 37|Polo| 0.2| 3558.0|
| Male| 38|Polo| 0.01| 1700.0|
| Male| 38|Polo| 0.1| 2245.0|
| Male| 38|Polo| 0.15| 2900.0|
| Male| 38|Polo| 0.2| 3477.0|
I want the output to be:
+------+----+----+-------------+-------------------+---------+
|gender|week|type| share| units| first|
+------+----+----+-------------+-------------------+---------+
| Male| 37|Polo| 0.01| 1809.0| 1809.0|
| Male| 37|Polo| 0.1| 2327.0| 1809.0|
| Male| 37|Polo| 0.15| 2982.0| 1809.0|
| Male| 37|Polo| 0.2| 3558.0| 1809.0|
| Male| 38|Polo| 0.01| 1700.0| 1700.0|
| Male| 38|Polo| 0.1| 2245.0| 1700.0|
| Male| 38|Polo| 0.15| 2900.0| 1700.0|
| Male| 38|Polo| 0.2| 3477.0| 1700.0|
How can I implement this?
Upvotes: 0
Views: 240
Reputation: 59
I found the answer out so I will be posting it here. I used a window function:
m_window = Window.partitionBy(["gender","week","type"]).orderBy("share")
Then I create a column using the function first
and over
window like this:
df.withColumn("first", first("units").over(m_window))
Upvotes: 1