Best way to find the first non-null occurrence of each column for each index?

Question

I have a dataframe that looks like this:

ItemID    Attribute    CostGrade    RelatedTo
---------------------------------------------
01A       tya        
01A       van
01A                     03a
01A                                 03B
01A                     02i
01A       lof           
01A                     o9g oa      
01A                                 07N
02B       ova           
02B                     39b         
02B       aga
04A       val
04A                     rg0
04A                     va0
04A       hla

As you can see, for each row, there are really only 2 values: the ItemID and a non-null value of either Attribute, CostGrade, or RelatedTo.

I want to convert the ItemID to a unique index, so that each ItemID has only one row, and takes any (doesn't matter which, can be first or last or random, since they are all valid and the combination is irrelevant) of the non-null values from each column. The desired output would look like this:

ItemID    Attribute    CostGrade    RelatedTo
---------------------------------------------
01A       tya          03a          03B
02B       ova          39b          NaN
04A       hla          rg0          NaN

Any help would be greatly appreciated!

Scott Boston · Accepted Answer

Try with groupby, bfill, and iloc:

df.groupby('ItemID', as_index=False).apply(lambda x: x.bfill().iloc[0])

Output:

  ItemID Attribute CostGrade RelatedTo
0    01A       tya       03a       03B
1    02B       ova       39b       NaN
2    04A       val       rg0       NaN

Best way to find the first non-null occurrence of each column for each index?

Answers (2)

Related Questions