eljiwo
eljiwo

Reputation: 846

Replace column value with other column when two other columns have equal value

So I have a dataframe like:

+--------------------+--------------+------------+-----------+-----------+-----------+-----------+
|     category       |category_new  |     value  |     body  |     legs  |     face  |     idle  |
+--------------------+--------------+------------+-----------+-----------+-----------+-----------+
| sn11               | sn11         | N          | Y         | Y         | Y         | acde      |
| sn1                | rs1          | N          | Y         | N         | N         | den       |
| sn1                | null         | Y          | N         | Y         | N         | can       |
| sn2                | rs2          | Y          | Y         | N         | N         | aeg       |
| null               | rs2          | N          | Y         | N         | Y         | ueg       |
+--------------------+--------------+------------+-----------+-----------+-----------+-----------+

I would like to replace value with face when body==legs. So for the first row wherebody and legs are both Y I will replace the value of value (N) with the value of face (Y).

Any idea on how to approach it?

Upvotes: 1

Views: 126

Answers (2)

busfighter
busfighter

Reputation: 636

You can do that with function when from pyspark.sql.functions:

from pyspark.sql import functions as F

df = df.withColumn('value', F.when(F.col('body') == F.col('legs'), F.col('face')).otherwise(F.col('value')))

Upvotes: 1

bousof
bousof

Reputation: 1251

Maybe you can try to do it using pandas.DataFrame.assign:

>>> import pandas as pd
>>> df = pd.DataFrame([
...   ['sn11','N','Y','Y','Y'],
...   ['sn1','N','Y','N','N'],
...   ['sn1','Y','N','Y','N'],
...   ['sn2','Y','Y','N','N'],
...   ['null','N','Y','N','Y']
>>> df
  category value body legs face
0     sn11     N    Y    Y    Y
1      sn1     N    Y    N    N
2      sn1     Y    N    Y    N
3      sn2     Y    Y    N    N
4     null     N    Y    N    Y
>>> df[df['body']==df['legs']] = df[df['body']==df['legs']].assign(value=df['face'])
>>> df
  category value body legs face
0     sn11     Y    Y    Y    Y
1      sn1     N    Y    N    N
2      sn1     Y    N    Y    N
3      sn2     Y    Y    N    N
4     null     N    Y    N    Y

Upvotes: 0

Related Questions