How to fill missing values in DataFrame using another DataFrame in Pandas

Question

my df looks like this:

sprint   sprint_created
------   -----------
S100     2020-01-01    
S101     2020-01-10
NULL     2020-01-20
NULL     2020-01-31
S101     2020-01-10
...

in the above df, you can see that some of the sprint values are NULL

I have another df2 that has sprint date ranges:

sprint   sprint_start   sprint_end
------   -----------    ----------
S100     2020-01-01     2020-01-09    
S101     2020-01-10     2020-01-19  
S102     2020-01-20     2020-01-29  
S103     2020-01-30     2020-02-09  
S104     2020-02-10     2020-02-19  
...

How can I map these data and fill in the NULL values in the df by comparing the data in the df2?

Please note that the shape of df and df2 are different.

David Erickson · Accepted Answer

One way would be to melt and resample your df2 and create a dictionary to map back to df1:

#make sure columns are in datetime format
df1['sprint_created'] = pd.to_datetime(df1['sprint_created'])
df2['sprint_start'] = pd.to_datetime(df2['sprint_start'])
df2['sprint_end'] = pd.to_datetime(df2['sprint_end'])

#melt dataframe of the two date columns and resample by group
new = (df2.melt(id_vars='sprint').drop('variable', axis=1).set_index('value')
          .groupby('sprint', group_keys=False).resample('D').ffill().reset_index())

#create dictionary of date and the sprint and map back to df1
dct = dict(zip(new['value'], new['sprint']))
df1['sprint'] = df1['sprint_created'].map(dct)
#or df1['sprint'] = df1['sprint'].fillna(df1['sprint_created'].map(dct))
df1
Out[1]: 
  sprint sprint_created
0   S100     2020-01-01
1   S101     2020-01-10
2   S102     2020-01-20
3   S103     2020-01-31
4   S101     2020-01-10

How to fill missing values in DataFrame using another DataFrame in Pandas

Answers (2)

Related Questions