Combine Pandas Data Frame where different columns are missing

Question

I have a pandas data frame that looks like this:

asset, cusip, information1, information2, ...., information_n
1x4,   43942,    45,       ,  NaN,    ,    , NaN
1x4,   43942,    NaN,      ,  "hello",     , NaN
1x4,   43942,    NaN,      ,  NaN,     , "goodbye"
...

What I want is:

asset, cusip, information1, information2, ...., information_n
1x4,   43942,    45,       , "hello",    ,    , "goodbye"
...

Essentially I want to collapse down over matching "assets" and "cusips" regardless of the fields. There will be only one entry that's not NAN in information1...information_n.

Note that some columns might be int, some strings, others floats, etc.

Vaishali · Accepted Answer

You can use groupby and first() which gives you first and in your case only non-NaN value

df = df.groupby(['asset', 'cusip']).first().reset_index()


    asset   cusip   information1    information2    information_n
0   1x4     43942   45              "hello"         "goodbye"

Combine Pandas Data Frame where different columns are missing

Answers (1)

Related Questions