Reputation: 1145
Let's say I have my main DataFrame.
df = pd.DataFrame({'ID': [1,1,1,2,2,2,3,3,3],
'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-01', '2021-01-02', '2021-01-03','2021-01-01', '2021-01-02', '2021-01-03'] ,
'Values': [11, np.nan, np.nan, 13, np.nan, np.nan, 15, np.nan, np.nan],
'Random_Col': [0,0,0,0,0,0,0,0,0]})
I want to fill the np.nan values with values from another dataframe that is not the same shape. The values have to match on "ID" and "Date".
new_df = pd.DataFrame({'ID': [1,1,2,2,3,3],
'Date': ['2021-01-02', '2021-01-03', '2021-01-02', '2021-01-03','2021-01-02','2021-01-03'],
'Values': [16, 19, 14, 14, 19, 18]})
What's the best way to do this?
I experimented with df.update(), but I'm not that works since the dataframes do not have the same number of rows. Am I wrong about this?
I could also use pd.merge(), but then I end up with multiple versions of each column and have to .fillna() for each specific column with the 2nd column with the new values. This would be fine if I only had 1 column of data to do this for, but I have dozens.
Is there a simpler way that I haven't considered?
Upvotes: 0
Views: 312
Reputation:
One option is to merge
+ sort_index
+ bfill
to fill the missing data in df
, then reindex with df.columns
. Since '\x00'
has the lowest value, the sorting should place the same column names next to each other.
out = (df.merge(new_df, on=['ID','Date'], how='left', suffixes=('','\x00'))
.sort_index(axis=1).bfill(axis=1)[df.columns])
Output:
ID Date Values Random_Col
0 1 2021-01-01 11.0 0
1 1 2021-01-02 16.0 0
2 1 2021-01-03 19.0 0
3 2 2021-01-01 13.0 0
4 2 2021-01-02 14.0 0
5 2 2021-01-03 14.0 0
6 3 2021-01-01 15.0 0
7 3 2021-01-02 19.0 0
8 3 2021-01-03 18.0 0
Upvotes: 1