Reputation: 346
i have a pandas dataframe like the following:
How do I fill up the empty cells with same policy numbers for same product type as they exist?
Any suggestion would be very much appreciated. Thank you
Sorry for the confusion, I am adding my sample dataframe now:
sample=[{'POLICY NUMBER':' ','PRODUCT TYPE':'MED'},{'POLICY NUMBER':' ','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M49763','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M86968','PRODUCT TYPE':'MED'},{'POLICY NUMBER':' ','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'566D158635 ','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'655D158635','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'789D158635','PRODUCT TYPE':'TED'}]
pd.DataFrame(sample)
please note that the empty cells have " " in them too, they are not NaN across the whole dataframe
Adding to the question above. If I have the altered dataframe as above. How do i get to the following dataframe:
Upvotes: 2
Views: 3066
Reputation: 863801
I think you need groupby
+ transform
:
If only one same category per group and no data are empty string
s :
df['POLICY NUMBER'] = (df.groupby('PRODUCT TYPE')['POLICY NUMBER']
.transform(lambda x: x[x != ''].iat[0]))
print (df)
POLICY NUMBER PRODUCT TYPE
0 433M86968 MED
1 433M86968 MED
2 433M86968 MED
3 433M86968 MED
4 566D158635 TED
5 566D158635 TED
6 566D158635 TED
7 566D158635 TED
Or if posible there are not always empty stings, but sometimes there are wtrailing whitespaces
, need strip
:
df['POLICY NUMBER'] = (df['POLICY NUMBER'].str.strip().groupby(df['PRODUCT TYPE'])
.transform(lambda x: x[x != ''].iat[0]))
print (df)
POLICY NUMBER PRODUCT TYPE
0 433M86968 MED
1 433M86968 MED
2 433M86968 MED
3 433M86968 MED
4 566D158635 TED
5 566D158635 TED
6 566D158635 TED
7 566D158635 TED
Solution with sorting and transform last
value:
df['POLICY NUMBER'] = (df.sort_values(['PRODUCT TYPE','POLICY NUMBER'])
.groupby('PRODUCT TYPE')['POLICY NUMBER']
.transform('last'))
print (df)
POLICY NUMBER PRODUCT TYPE
0 433M86968 MED
1 433M86968 MED
2 433M86968 MED
3 433M86968 MED
4 566D158635 TED
5 566D158635 TED
6 566D158635 TED
7 566D158635 TED
EDIT: You need replace empty strings by NaN
s and then use bfill
for back forward filling NaN
s with ffill
for forward fillin NaNs:
df['POLICY NUMBER'] = (df['POLICY NUMBER'].str.strip()
.replace('',np.nan)
.groupby(df['PRODUCT TYPE'])
.transform(lambda x: x.bfill().ffill()))
print (df)
POLICY NUMBER PRODUCT TYPE
0 433M49763 MED
1 433M49763 MED
2 433M49763 MED
3 433M86968 MED
4 566D158635 TED
5 566D158635 TED
6 566D158635 TED
7 789D158635 TED
Upvotes: 6