Abdul Rehman
Abdul Rehman

Reputation: 5654

Replace a column with another column if another is not null in pandas DataFrame

I'm working with a dataframe using Pandas in which I have to replace a column if another colum value is not null.

My dataframe is something like:

v_4        v5             s_5     vt_5     ex_5          pfv           pfv_cat
0-50      StoreSale     Clothes   8-Apr   above 100   FatimaStore       Shoes
0-50      StoreSale     Clothes   8-Apr   0-50        DiscountWorld     Clothes
51-100    CleanShop     Clothes   4-Dec   51-100      BetterUncle       Shoes

So, I want to replace v_5 with pfv where pfv is not null, how can I achieve that?

Upvotes: 21

Views: 40423

Answers (6)

YagoCaruso
YagoCaruso

Reputation: 331

You should consider using the numpy where function which runs much faster the all the apply methods.

Where is basically if else function for vectors. The first entry is a vector with the condition, the second the value if true and third if false. Here is how it would look:

import numpy as np
df['v_5'] = np.where(~df['pfv'].isnull(),df['pfv'],df['v_5'])

Good luck

Upvotes: 21

yan-hic
yan-hic

Reputation: 1544

Late in the game but if truly nulls (not 'null' strings), you could also use

df['v_5'] = df['pfv'].combine_first(df['v_5'])

which is equivalent to COALESCE() in SQL.

Upvotes: 4

flork
flork

Reputation: 21

You can use Series.update from pandas to replace with non-NA values from passed Series.

Therefore you need first to replace 'null' with NaN.

Example:

import pandas as pd
import numpy as np

 df = pd.DataFrame({'A': ['a', 'b', 'c'],
               'B': ['x', 'null', 'z']})

 # Replace null with nan
 df.loc[:,'B']  =  df.loc[:,'B'].replace('null', np.nan)
 # Update column, only if other column is non-nan (inplace)
 df.A.update(df.B)

Or if you don't want to change null with NaN

import pandas as pd
import numpy as np

 df = pd.DataFrame({'A': ['a', 'b', 'c'],
               'B': ['x', 'null', 'z']})

 # Replace null with nan
 temp  =  df.loc[:,'B'].replace('null', np.nan)
 # Update column, only if other column is non-nan (inplace)
 df.A.update(temp)

Upvotes: 2

jezrael
jezrael

Reputation: 862691

Because missing values are strings null, use:

df.loc[df['pfv'].ne('null'), 'v5'] = df["pfv"]
print (df)
      v_4             v5      s_5   vt_5       ex_5            pfv  pfv_cat
0    0-50      StoreSale  Clothes  8-Apr  above 100           null    Shoes
1    0-50  DiscountWorld  Clothes  8-Apr       0-50  DiscountWorld  Clothes
2  51-100    BetterUncle  Clothes  4-Dec     51-100    BetterUncle    Shoes

If missing values are NaN or Nones (not strings) use Series.fillna:

df['v5'] = df['pfv'].fillna(df['v5'])

print (df)
      v_4             v5      s_5   vt_5       ex_5            pfv  pfv_cat
0    0-50      StoreSale  Clothes  8-Apr  above 100            NaN    Shoes
1    0-50  DiscountWorld  Clothes  8-Apr       0-50  DiscountWorld  Clothes
2  51-100    BetterUncle  Clothes  4-Dec     51-100    BetterUncle    Shoes

Upvotes: 8

powerPixie
powerPixie

Reputation: 708

My solution is the same of jezrael, but with one more step, based on an essay I made with the null problem. I've added one more row for a pfv without values.

    data = [['0-50','StoreSale','Clothes','8-Apr','above 100','FatimaStore','Shoes'],
    ['0-50','StoreSale','Clothes','8-Apr','0-50','DiscountWorld','Clothes'],
    ['51-100','CleanShop','Clothes','4-Dec','51-100','BetterUncle','Shoes'],
    ['0-50','StoreSale','Clothes','12-Apr','above 100','','Clothes']]

First step is to handle nulls. 'df' is the DataFrame.

    df = df.replace('', np.nan)

          v_4         v5      s_5    vt_5       ex_5            pfv  pfv_cat
    0    0-50  StoreSale  Clothes   8-Apr  above 100    FatimaStore    Shoes
    1    0-50  StoreSale  Clothes   8-Apr       0-50  DiscountWorld  Clothes
    2  51-100  CleanShop  Clothes   4-Dec     51-100    BetterUncle    Shoes
    3    0-50  StoreSale  Clothes  12-Apr  above 100            NaN  Clothes

Now let's update the v5 column. The command says that we will replace v5 for pfv, but if pfv is NaN we will replace with the current value of v5.

    df['v5'] = df['pfv'].fillna(df['v5'])


    print(df)

         v_4             v5      s_5    vt_5       ex_5            pfv  pfv_cat
    0    0-50    FatimaStore  Clothes   8-Apr  above 100    FatimaStore    Shoes
    1    0-50  DiscountWorld  Clothes   8-Apr       0-50  DiscountWorld  Clothes
    2  51-100    BetterUncle  Clothes   4-Dec     51-100    BetterUncle    Shoes
    3    0-50      StoreSale  Clothes  12-Apr  above 100            NaN  Clothes

Upvotes: 1

F Blanchet
F Blanchet

Reputation: 1510

You can use the apply method from pandas and numpy:

df['v_5'] = df.apply(lambda row: row['pfv'] if row['pfv']==np.NaN else row['v_5'], axis=1)

or without numpy :

df['v_5'] = df.apply(lambda row: row['pfv'] if pd.isnull(row['pfv']) else row['v_5'], axis=1)

Upvotes: 0

Related Questions