Reputation: 221
I have a dataframe that looks like this:
ItemID Attribute CostGrade RelatedTo
---------------------------------------------
01A tya
01A van
01A 03a
01A 03B
01A 02i
01A lof
01A o9g oa
01A 07N
02B ova
02B 39b
02B aga
04A val
04A rg0
04A va0
04A hla
As you can see, for each row, there are really only 2 values: the ItemID and a non-null value of either Attribute, CostGrade, or RelatedTo.
I want to convert the ItemID to a unique index, so that each ItemID has only one row, and takes any (doesn't matter which, can be first or last or random, since they are all valid and the combination is irrelevant) of the non-null values from each column. The desired output would look like this:
ItemID Attribute CostGrade RelatedTo
---------------------------------------------
01A tya 03a 03B
02B ova 39b NaN
04A hla rg0 NaN
Any help would be greatly appreciated!
Upvotes: 1
Views: 59
Reputation: 153500
Try with groupby
, bfill
, and iloc
:
df.groupby('ItemID', as_index=False).apply(lambda x: x.bfill().iloc[0])
Output:
ItemID Attribute CostGrade RelatedTo
0 01A tya 03a 03B
1 02B ova 39b NaN
2 04A val rg0 NaN
Upvotes: 2
Reputation: 1012
I think this will do what you need:
# ...if it's not already really NaN, do this
#import numpy as np
#df = df.replace('', np.nan)
df.groupby('ItemID').apply(lambda x: x.fillna(method='bfill').fillna(method='ffill')).drop_duplicates(subset='ItemID')
Doing the fillna both backwards and forwards should ensure that if there's anything in that group you'll get something.
Upvotes: 0