Reputation: 611
I have a dataframe
df = pd.DataFrame([["A",1,98,88,"",567,453,545,656,323,756], ["B",1,99,"","",231,232,234,943,474,345], ["C",1,97,67,23,543,458,456,876,935,876], ["B",1,"",79,84,895,237,678,452,545,453], ["A",1,45,"",58,334,778,234,983,858,657], ["C",1,23,55,"",183,565,953,565,234,234]], columns=["id","date","col1","col2","col3","col1_num","col1_deno","col3_num","col3_deno","col2_num","col2_deno"])
I need to make Nan/blank values for respective _num and _deno for column name. Ex: Make values Nan/blank for "col1_num" and "col1_deno" if particular row of "col1" is blank. Repeat the same process for "col2_num" and "col2_deno" based on "col2", and for "col3_num" and "col3_deno" based on "col3".
Expected Output:
df_out = pd.DataFrame([["A",1,98,88,"",567,453,"","",323,756], ["B",1,99,"","",231,232,"","","",""], ["C",1,97,67,23,543,458,456,876,935,876], ["B",1,"",79,84,"","",678,452,545,453], ["A",1,45,"",58,334,778,234,983,"",""], ["C",1,23,55,"",183,565,"","",234,234]], columns=["id","date","col1","col2","col3","col1_num","col1_deno","col3_num","col3_deno","col2_num","col2_deno"])
How to do it?
Upvotes: 0
Views: 56
Reputation: 862511
Solution with MultiIndex
:
#first convert not processing and testing columns to index
df1 = df.set_index(['id','date'])
cols = df1.columns
#split columns by _ for MultiIndex
df1.columns = df1.columns.str.split('_', expand=True)
#compare columns without _ (with NaN in second level) by empty string
m = df1.xs(np.nan, axis=1, level=1).eq('')
#create mask by all columns
mask = m.reindex(df1.columns, axis=1, level=0)
#set new values by mask, overwrite columns names
df1 = df1.mask(mask, '').set_axis(cols, axis=1).reset_index()
print (df1)
id date col1 col2 col3 col1_num col1_deno col3_num col3_deno col2_num \
0 A 1 98 88 567 453 323
1 B 1 99 231 232
2 C 1 97 67 23 543 458 456 876 935
3 B 1 79 84 678 452 545
4 A 1 45 58 334 778 234 983
5 C 1 23 55 183 565 234
col2_deno
0 756
1
2 876
3 453
4
5 234
Upvotes: 1
Reputation: 28644
@shubham's answer is simple and to the point and I believe faster as well; this is just an option, where you may not be able to (or want to) list all the columns
Get the list of columns that need to be changed:
cols = [col for col in df if col.startswith('col')]
['col1',
'col2',
'col3',
'col1_num',
'col1_deno',
'col3_num',
'col3_deno',
'col2_num',
'col2_deno']
Create a dictionary pairing col1 to the columns to be changed, same for col2 and so on:
from collections import defaultdict
d = defaultdict(list)
for col in cols:
if "_" in col:
d[col.split("_")[0]].append(col)
d
defaultdict(list,
{'col1': ['col1_num', 'col1_deno'],
'col3': ['col3_num', 'col3_deno'],
'col2': ['col2_num', 'col2_deno']})
Iterate through the dict to assign the new values:
for key, val in d.items():
df.loc[df[key].eq(""), val] = ""
id date col1 col2 col3 col1_num col1_deno col3_num col3_deno col2_num col2_deno
0 A 1 98 88 567 453 323 756
1 B 1 99 231 232
2 C 1 97 67 23 543 458 456 876 935 876
3 B 1 79 84 678 452 545 453
4 A 1 45 58 334 778 234 983
5 C 1 23 55 183 565 234 234
Upvotes: 3
Reputation: 71689
Let us try with boolean masking:
# select the columns
c = pd.Index(['col1', 'col2', 'col3'])
# create boolean mask
m = df[c].eq('').to_numpy()
# mask the values in `_num` and `_deno` like columns
df[c + '_num'] = df[c + '_num'].mask(m, '')
df[c + '_deno'] = df[c + '_deno'].mask(m, '')
>>> df
id date col1 col2 col3 col1_num col1_deno col3_num col3_deno col2_num col2_deno
0 A 1 98 88 567 453 323 756
1 B 1 99 231 232
2 C 1 97 67 23 543 458 456 876 935 876
3 B 1 79 84 678 452 545 453
4 A 1 45 58 334 778 234 983
5 C 1 23 55 183 565 234 234
Upvotes: 3