Reputation: 297
So I wrote this snippet of code to search the entire Warehouse Column of data frame 1 for occurrences of cell values that appear 20 times or more and if the cell value satisfies this requirement, the code will write the cell value of the Warehouse column over the cell value of the GeneralDescription column. It is working but I want to add to it so it will not overwrite a cell value unless it is empty or NaN or ''.
import pandas as pd
df1.Warehouse=df1.Warehouse.str.upper()
and for some odd reason i cant indent the next line of code four spaces
df1.loc[df1.groupby('Warehouse').Warehouse.transform('count').gt(20),'GeneralDescription']=df1.Warehouse
Upvotes: 1
Views: 1261
Reputation: 402413
Augment your masking condition, AND it with another that indicates whether a column contains one of those empty values.
i = df1.groupby('Warehouse').Warehouse.transform('count').gt(20)
j = df1.Warehouse.isin([np.nan, 'Empty', ''])
df1.loc[i & j, 'GeneralDescription'] = df1.Warehouse
Alternatively, when loading in your data, specify na_values
so everything becomes NaN:
df = pd.read_csv(..., na_values=['Empty', '', 'NaN', 'nan'])
And you can then just do:
i = df1.groupby('Warehouse').Warehouse.transform('count').gt(20)
j = df1.Warehouse.isnull()
df1.loc[i & j, 'GeneralDescription'] = df1.Warehouse
Which is arguably faster.
Upvotes: 2