bcmyers
bcmyers

Reputation: 63

Complex case of filling NaNs in Pandas

Is there a way to go from this...

   bloomberg  morningstar  yahoo
0  AAPL1      AAPL2        NaN
1  AAPL1      NaN          AAPL3
2  NaN        GOOG4        GOOG5
3  GOOG6      GOOG4        NaN
4  IBM7       NaN          IBM8
5  NaN        IBM9         IBM8
6  NaN        NaN          FB

... to this ...

   bloomberg  morningstar  yahoo
0  AAPL1      AAPL2        AAPL3
1  GOOG6      GOOG4        GOOG5
2  IBM7       IBM9         IBM8
3  NaN        NaN          FB

... in Pandas?

I've munged my data enough to ensure that there will never be any "conflicting" information in a given column of the starting dataframe, e.g. the following is not possible...

   A column  Another column
0  AAPL1     One thing
1  AAPL1     Another thing

The only thing that can happen is that any given column either has 1) no information or 2) the right information, e.g.

   A column  Another column
0  AAPL1     NaN
1  AAPL1     The right information

All I want to do is fill the NaN's with the "right" information where available and then drop duplicates (which should be easy).

But some NaNs should remain, as I don't have enough data to infer their value, e.g. the FB row in the example.

Anybody have a good answer? Thanks for the help!

Here is some code to load the starting dataframe if you'd like to play around:

import pandas as pd
data = [
        {'bloomberg': 'AAPL1', 'morningstar': 'AAPL2'},
        {'bloomberg': 'AAPL1', 'yahoo': 'AAPL3'},
        {'morningstar': 'GOOG4', 'yahoo': 'GOOG5'},
        {'bloomberg': 'GOOG6', 'morningstar': 'GOOG4'},
        {'bloomberg': 'IBM7', 'yahoo': 'IBM8'},
        {'morningstar': 'IBM9', 'yahoo': 'IBM8'},
        {'yahoo': 'FB'}]
df = pd.DataFrame(data)

Upvotes: 2

Views: 301

Answers (1)

Silenced Temporarily
Silenced Temporarily

Reputation: 1004

Chaining ffill and bfill will do what you want:

df.fillna(method='ffill', axis=1).fillna(method='bfill', axis=1).drop_duplicates()

  bloomberg morningstar yahoo
0      AAPL        AAPL  AAPL
2      GOOG        GOOG  GOOG
4       IBM         IBM   IBM

Upvotes: 1

Related Questions