Reputation: 846
So I have a dataframe like:
+--------------------+--------------+------------+-----------+-----------+-----------+-----------+
| category |category_new | value | body | legs | face | idle |
+--------------------+--------------+------------+-----------+-----------+-----------+-----------+
| sn11 | sn11 | N | Y | Y | Y | acde |
| sn1 | rs1 | N | Y | N | N | den |
| sn1 | null | Y | N | Y | N | can |
| sn2 | rs2 | Y | Y | N | N | aeg |
| null | rs2 | N | Y | N | Y | ueg |
+--------------------+--------------+------------+-----------+-----------+-----------+-----------+
I would like to replace value
with face when body==legs
. So for the first row wherebody
and legs
are both Y I will replace the value of value
(N) with the value of face
(Y).
Any idea on how to approach it?
Upvotes: 1
Views: 126
Reputation: 636
You can do that with function when
from pyspark.sql.functions
:
from pyspark.sql import functions as F
df = df.withColumn('value', F.when(F.col('body') == F.col('legs'), F.col('face')).otherwise(F.col('value')))
Upvotes: 1
Reputation: 1251
Maybe you can try to do it using pandas.DataFrame.assign:
>>> import pandas as pd
>>> df = pd.DataFrame([
... ['sn11','N','Y','Y','Y'],
... ['sn1','N','Y','N','N'],
... ['sn1','Y','N','Y','N'],
... ['sn2','Y','Y','N','N'],
... ['null','N','Y','N','Y']
>>> df
category value body legs face
0 sn11 N Y Y Y
1 sn1 N Y N N
2 sn1 Y N Y N
3 sn2 Y Y N N
4 null N Y N Y
>>> df[df['body']==df['legs']] = df[df['body']==df['legs']].assign(value=df['face'])
>>> df
category value body legs face
0 sn11 Y Y Y Y
1 sn1 N Y N N
2 sn1 Y N Y N
3 sn2 Y Y N N
4 null N Y N Y
Upvotes: 0