Reputation: 5654
I'm working with a dataframe using Pandas in which I have to replace a column if another colum value is not null.
My dataframe is something like:
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0-50 StoreSale Clothes 8-Apr above 100 FatimaStore Shoes
0-50 StoreSale Clothes 8-Apr 0-50 DiscountWorld Clothes
51-100 CleanShop Clothes 4-Dec 51-100 BetterUncle Shoes
So, I want to replace v_5
with pfv
where pfv
is not null, how can I achieve that?
Upvotes: 21
Views: 40423
Reputation: 331
You should consider using the numpy where function which runs much faster the all the apply methods.
Where is basically if else function for vectors. The first entry is a vector with the condition, the second the value if true and third if false. Here is how it would look:
import numpy as np
df['v_5'] = np.where(~df['pfv'].isnull(),df['pfv'],df['v_5'])
Good luck
Upvotes: 21
Reputation: 1544
Late in the game but if truly nulls
(not 'null'
strings), you could also use
df['v_5'] = df['pfv'].combine_first(df['v_5'])
which is equivalent to COALESCE()
in SQL.
Upvotes: 4
Reputation: 21
You can use Series.update from pandas
to replace with non-NA values from passed Series.
Therefore you need first to replace 'null' with NaN.
Example:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['a', 'b', 'c'],
'B': ['x', 'null', 'z']})
# Replace null with nan
df.loc[:,'B'] = df.loc[:,'B'].replace('null', np.nan)
# Update column, only if other column is non-nan (inplace)
df.A.update(df.B)
Or if you don't want to change null with NaN
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': ['a', 'b', 'c'],
'B': ['x', 'null', 'z']})
# Replace null with nan
temp = df.loc[:,'B'].replace('null', np.nan)
# Update column, only if other column is non-nan (inplace)
df.A.update(temp)
Upvotes: 2
Reputation: 862691
Because missing values are strings null
, use:
df.loc[df['pfv'].ne('null'), 'v5'] = df["pfv"]
print (df)
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0 0-50 StoreSale Clothes 8-Apr above 100 null Shoes
1 0-50 DiscountWorld Clothes 8-Apr 0-50 DiscountWorld Clothes
2 51-100 BetterUncle Clothes 4-Dec 51-100 BetterUncle Shoes
If missing values are NaN
or None
s (not strings) use Series.fillna
:
df['v5'] = df['pfv'].fillna(df['v5'])
print (df)
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0 0-50 StoreSale Clothes 8-Apr above 100 NaN Shoes
1 0-50 DiscountWorld Clothes 8-Apr 0-50 DiscountWorld Clothes
2 51-100 BetterUncle Clothes 4-Dec 51-100 BetterUncle Shoes
Upvotes: 8
Reputation: 708
My solution is the same of jezrael, but with one more step, based on an essay I made with the null problem. I've added one more row for a pfv without values.
data = [['0-50','StoreSale','Clothes','8-Apr','above 100','FatimaStore','Shoes'],
['0-50','StoreSale','Clothes','8-Apr','0-50','DiscountWorld','Clothes'],
['51-100','CleanShop','Clothes','4-Dec','51-100','BetterUncle','Shoes'],
['0-50','StoreSale','Clothes','12-Apr','above 100','','Clothes']]
First step is to handle nulls. 'df' is the DataFrame.
df = df.replace('', np.nan)
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0 0-50 StoreSale Clothes 8-Apr above 100 FatimaStore Shoes
1 0-50 StoreSale Clothes 8-Apr 0-50 DiscountWorld Clothes
2 51-100 CleanShop Clothes 4-Dec 51-100 BetterUncle Shoes
3 0-50 StoreSale Clothes 12-Apr above 100 NaN Clothes
Now let's update the v5 column. The command says that we will replace v5 for pfv, but if pfv is NaN we will replace with the current value of v5.
df['v5'] = df['pfv'].fillna(df['v5'])
print(df)
v_4 v5 s_5 vt_5 ex_5 pfv pfv_cat
0 0-50 FatimaStore Clothes 8-Apr above 100 FatimaStore Shoes
1 0-50 DiscountWorld Clothes 8-Apr 0-50 DiscountWorld Clothes
2 51-100 BetterUncle Clothes 4-Dec 51-100 BetterUncle Shoes
3 0-50 StoreSale Clothes 12-Apr above 100 NaN Clothes
Upvotes: 1
Reputation: 1510
You can use the apply method from pandas
and numpy
:
df['v_5'] = df.apply(lambda row: row['pfv'] if row['pfv']==np.NaN else row['v_5'], axis=1)
or without numpy :
df['v_5'] = df.apply(lambda row: row['pfv'] if pd.isnull(row['pfv']) else row['v_5'], axis=1)
Upvotes: 0