Reputation: 751
I have a df
like:
col1 col2 col3 col4
0 a 1 jake
1 b 1 li
2 c 2 bob corn
3 d 2 pat
4 e 2 angie
5 f 1 jose pepper
6 g 3 juan
What I must do is update the missing value of col4 for rows where col2 is duplicated and there exists a col4 value in any of those duplicates.
So the output would look like:
col1 col2 col3 col4
0 a 1 jake pepper
1 b 1 li pepper
2 c 2 bob corn
3 d 2 pat corn
4 e 2 angie corn
5 f 1 jose pepper
6 g 3 juan
Any thoughts?
Upvotes: 1
Views: 48
Reputation: 4482
It could be achieved like this :
dictionary = df[~(df.col4=='')].set_index("col2")["col4"].to_dict()
df.col4 = df.col2.apply(lambda x : dictionary[x] if x in dictionary.keys() else '')
Output:
| col1 | col2 | col3 | col4 |
|:-------|-------:|:-------|:-------|
| a | 1 | jake | pepper |
| b | 1 | li | pepper |
| c | 2 | bob | corn |
| d | 2 | pat | corn |
| e | 2 | angie | corn |
| f | 1 | jose | pepper |
| g | 3 | juan | |
Upvotes: 0
Reputation: 150745
I would do a fillna
with groupby().transform
:
df['col4'] = df['col4'].fillna(df.groupby('col2')['col4'].transform('first'))
Output:
col1 col2 col3 col4
0 a 1 jake pepper
1 b 1 li pepper
2 c 2 bob corn
3 d 2 pat corn
4 e 2 angie corn
5 f 1 jose pepper
6 g 3 juan NaN
Upvotes: 4
Reputation: 11192
try this,
dic = df[df["col4"].notnull()].set_index("col2")["col4"].to_dict()
df.loc[df["col4"].isnull(), "col4"]=df["col2"]
df["col4"] = df["col4"].replace(dic)
O/P:
col1 col2 col3 col4
0 a 1 jake pepper
1 b 1 li pepper
2 c 2 bob corn
3 d 2 pat corn
4 e 2 angie corn
5 f 1 jose pepper
6 g 3 juan 3
you could remove 3
by finding str.isdigit()
Upvotes: 0