Royale_w_cheese
Royale_w_cheese

Reputation: 297

Update values in a column with another only if they are blank, empty or NaN

So I wrote this snippet of code to search the entire Warehouse Column of data frame 1 for occurrences of cell values that appear 20 times or more and if the cell value satisfies this requirement, the code will write the cell value of the Warehouse column over the cell value of the GeneralDescription column. It is working but I want to add to it so it will not overwrite a cell value unless it is empty or NaN or ''.

import pandas as pd 

df1.Warehouse=df1.Warehouse.str.upper() 

and for some odd reason i cant indent the next line of code four spaces

df1.loc[df1.groupby('Warehouse').Warehouse.transform('count').gt(20),'GeneralDescription']=df1.Warehouse 

Upvotes: 1

Views: 1261

Answers (1)

cs95
cs95

Reputation: 402413

Augment your masking condition, AND it with another that indicates whether a column contains one of those empty values.

i = df1.groupby('Warehouse').Warehouse.transform('count').gt(20)
j = df1.Warehouse.isin([np.nan, 'Empty', ''])

df1.loc[i & j, 'GeneralDescription'] = df1.Warehouse 

Alternatively, when loading in your data, specify na_values so everything becomes NaN:

df = pd.read_csv(..., na_values=['Empty', '', 'NaN', 'nan'])

And you can then just do:

i = df1.groupby('Warehouse').Warehouse.transform('count').gt(20)
j = df1.Warehouse.isnull()
df1.loc[i & j, 'GeneralDescription'] = df1.Warehouse 

Which is arguably faster.

Upvotes: 2

Related Questions