How to modify dataframe based on column values

Question

I want to add relationships to column 'relations' based on rel_list. Specifically, for each tuple, i.e. ('a', 'b'), I want to replace the relationships column value '' with 'b' in the first row, but no duplicate, meaning that for the 2nd row, don't replace '' with 'a', since they are considered as duplicated. The following code doesn't work fully correct:

import pandas as pd

data = {
  "names": ['a', 'b', 'c', 'd'],
  "ages": [50, 40, 45, 20],
  "relations": ['', '', '', '']
}
rel_list = [('a', 'b'), ('a', 'c'), ('c', 'd')]

df = pd.DataFrame(data)

for rel_tuple in rel_list:
  head = rel_tuple[0]
  tail = rel_tuple[1]

  df.loc[df.names == head, 'relations'] = tail

print(df)

The current result of df is:

     names  ages relations
0     a    50         c
1     b    40          
2     c    45         d
3     d    20

However, the correct one is:

    names  ages relations
0     a    50         b
0     a    50         c
1     b    40          
2     c    45         d
3     d    20

There are new rows that need to be added. The 2nd row in this case, like above. How to do that?

mozway · Accepted Answer

You can craft a dataframe and merge:

(df.drop('relations', axis=1)
   .merge(pd.DataFrame(rel_list, columns=['names', 'relations']),
          on='names',
          how='outer'
         )
  # .fillna('') # uncomment to replace NaN with empty string
 )

Output:

  names  ages relations
0     a    50         b
1     a    50         c
2     b    40       NaN
3     c    45         d
4     d    20       NaN

How to modify dataframe based on column values

Answers (2)

Related Questions