Reputation: 4767
I have a dataframe that looks like this, with a MultiIndex over ('timestamp', 'id')
:
value
timestamp id
2020-03-03 A 100
2020-03-03 B 222
2020-03-03 C 5000
2020-03-04 A NaN
2020-03-04 B 1
2020-03-04 C NaN
2020-03-05 A 200
2020-03-05 B NaN
2020-03-05 C NaN
2020-03-06 A NaN
2020-03-06 B 20
2020-03-06 C NaN
I want to forwards fill (timewise) on value
so that the dataframe is populated with the most recently available data item, i.e. the DataFrame becomes:
value
timestamp id
2020-03-03 A 100
2020-03-03 B 222
2020-03-03 C 5000
2020-03-04 A 100
2020-03-04 B 1
2020-03-04 C 5000
2020-03-05 A 200
2020-03-05 B 1
2020-03-05 C 5000
2020-03-06 A 200
2020-03-06 B 20
2020-03-06 C 5000
Is there any easy way using resampler?
Upvotes: 2
Views: 1934
Reputation: 4767
You can also use stack
to arrange the data in a correct 2D representation for filling (column-wise) and then unstack back to the original format. This treats columns (i.e. indexes) separately as opposed to rolling over data values, which is the case in the other solution given.
a = ['2020-03-03','2020-03-04','2020-03-05', '2020-03-06']
b = ['A', 'B', 'C']
c = ['value1', 'value2']
df = pd.DataFrame(data=None, index=pd.MultiIndex.from_product([a,b]), columns=c)
df.loc[('2020-03-03', slice(None)), 'value1'] = np.array([100, 222, 5000])
df.loc[('2020-03-04', 'B'), 'value1'] = 1.0
df.loc[('2020-03-05', 'A'), 'value1'] = 200.0
df.loc[('2020-03-06', 'C'), 'value1'] = 20
df['value2'] = df['value1']
df.loc[('2020-03-03', 'C'), 'value2'] = np.nan
df
value1 value2
timestamp id
2020-03-03 A 100 100
2020-03-03 B 222 222
2020-03-03 C 5000 NaN # <- OBS!
2020-03-04 A NaN NaN
2020-03-04 B 1 1
2020-03-04 C NaN NaN
2020-03-05 A 200 200
2020-03-05 B NaN NaN
2020-03-05 C NaN NaN
2020-03-06 A NaN NaN
2020-03-06 B 20 20
2020-03-06 C NaN NaN
Using df.unstack().fillna(method='ffill')
gives
value1 value2
A B C A B C
timestamp
2020-03-03 100 222 5000 100 222 NaN
2020-03-04 100 1 5000 100 1 NaN
2020-03-05 200 1 5000 200 1 NaN
2020-03-06 200 1 20 200 1 20
This can be reverted with .stack()
to the original format again.
Comparing this to df.sort_index(level=1).ffill().reindex(df.index)
the difference is in the last column where since 'C' start with an NaN
the value from 'B' of 1 is rolled into the start of 'C' for 'Value2'.
Upvotes: 2
Reputation: 75080
You can sort the second level and ffill , then reindex like original:
df.sort_index(level=1).ffill().reindex(df.index)
value
timestamp id
2020-03-03 A 100.0
B 222.0
C 5000.0
2020-03-04 A 100.0
B 1.0
C 5000.0
2020-03-05 A 200.0
B 1.0
C 5000.0
2020-03-06 A 200.0
B 20.0
C 5000.0
Upvotes: 4