Reputation: 751

Updating missing values in dataframe

I have a df like:

    col1    col2    col3    col4
0   a       1       jake    
1   b       1       li
2   c       2       bob     corn
3   d       2       pat     
4   e       2       angie
5   f       1       jose    pepper
6   g       3       juan

What I must do is update the missing value of col4 for rows where col2 is duplicated and there exists a col4 value in any of those duplicates.

So the output would look like:

    col1    col2    col3    col4
0   a       1       jake    pepper 
1   b       1       li      pepper
2   c       2       bob     corn
3   d       2       pat     corn
4   e       2       angie   corn
5   f       1       jose    pepper
6   g       3       juan

Any thoughts?

Upvotes: 1

Answers (3)

Sebastien D

Reputation: 4482

It could be achieved like this :

dictionary = df[~(df.col4=='')].set_index("col2")["col4"].to_dict()
df.col4 = df.col2.apply(lambda x : dictionary[x] if x in dictionary.keys() else '')

Output:

| col1   |   col2 | col3   | col4   |
|:-------|-------:|:-------|:-------|
| a      |      1 | jake   | pepper |
| b      |      1 | li     | pepper |
| c      |      2 | bob    | corn   |
| d      |      2 | pat    | corn   |
| e      |      2 | angie  | corn   |
| f      |      1 | jose   | pepper |
| g      |      3 | juan   |        |

Upvotes: 0

Quang Hoang

Reputation: 150745

I would do a fillna with groupby().transform:

df['col4'] = df['col4'].fillna(df.groupby('col2')['col4'].transform('first'))

Output:

  col1  col2   col3    col4
0    a     1   jake  pepper
1    b     1     li  pepper
2    c     2    bob    corn
3    d     2    pat    corn
4    e     2  angie    corn
5    f     1   jose  pepper
6    g     3   juan     NaN

Upvotes: 4

Mohamed Thasin ah

Reputation: 11192

try this,

dic = df[df["col4"].notnull()].set_index("col2")["col4"].to_dict()
df.loc[df["col4"].isnull(), "col4"]=df["col2"]
df["col4"] = df["col4"].replace(dic)

O/P:

  col1  col2      col3    col4
0    a     1      jake  pepper
1    b     1        li  pepper
2    c     2       bob    corn
3    d     2       pat    corn
4    e     2     angie    corn
5    f     1      jose  pepper
6    g     3      juan       3

you could remove 3 by finding str.isdigit()

Upvotes: 0

Updating missing values in dataframe

Answers (3)

Related Questions