Reputation: 63
bloomberg morningstar yahoo
0 AAPL1 AAPL2 NaN
1 AAPL1 NaN AAPL3
2 NaN GOOG4 GOOG5
3 GOOG6 GOOG4 NaN
4 IBM7 NaN IBM8
5 NaN IBM9 IBM8
6 NaN NaN FB
bloomberg morningstar yahoo
0 AAPL1 AAPL2 AAPL3
1 GOOG6 GOOG4 GOOG5
2 IBM7 IBM9 IBM8
3 NaN NaN FB
I've munged my data enough to ensure that there will never be any "conflicting" information in a given column of the starting dataframe, e.g. the following is not possible...
A column Another column
0 AAPL1 One thing
1 AAPL1 Another thing
The only thing that can happen is that any given column either has 1) no information or 2) the right information, e.g.
A column Another column
0 AAPL1 NaN
1 AAPL1 The right information
All I want to do is fill the NaN's with the "right" information where available and then drop duplicates (which should be easy).
But some NaNs should remain, as I don't have enough data to infer their value, e.g. the FB row in the example.
Here is some code to load the starting dataframe if you'd like to play around:
import pandas as pd
data = [
{'bloomberg': 'AAPL1', 'morningstar': 'AAPL2'},
{'bloomberg': 'AAPL1', 'yahoo': 'AAPL3'},
{'morningstar': 'GOOG4', 'yahoo': 'GOOG5'},
{'bloomberg': 'GOOG6', 'morningstar': 'GOOG4'},
{'bloomberg': 'IBM7', 'yahoo': 'IBM8'},
{'morningstar': 'IBM9', 'yahoo': 'IBM8'},
{'yahoo': 'FB'}]
df = pd.DataFrame(data)
Upvotes: 2
Views: 301
Reputation: 1004
Chaining ffill
and bfill
will do what you want:
df.fillna(method='ffill', axis=1).fillna(method='bfill', axis=1).drop_duplicates()
bloomberg morningstar yahoo
0 AAPL AAPL AAPL
2 GOOG GOOG GOOG
4 IBM IBM IBM
Upvotes: 1