Reputation: 1007
I have a PANDAs DataFrame with a MultiIndex, where one of the levels represents a year:
import pandas as pd
df = pd.DataFrame(dict(A = ['foo', 'foo', 'bar', 'bar', 'bar', 'bar'],
B = ['white', 'black', 'white', 'white', 'black', 'black'],
year = [1990, 1992, 1990, 1992, 1991, 1992],
value = [3.14, 1.20, 4.56, 6.79, 0.01, 0.02]))
df = df.set_index(['A', 'B', 'year'])
I would like to forward-fill values, but only for the intervening years in each group (defined by the interaction of A and B). Here is the input:
value
A B year
foo white 1990 3.14
black 1992 1.20
bar white 1990 4.56
1992 6.79
black 1991 0.01
1992 0.02
And here is the desired output, with one additional row:
value
A B year
foo white 1990 3.14
black 1992 1.20
bar white 1990 4.56
1991 4.56 <-- new forward-filled value
1992 6.79
black 1991 0.01
1992 0.02
How can I accomplish this concisely and efficiently? I've tried using combinations of groupby
and apply
, but I'm new to PANDAS and keep throwing Exceptions.
Here's an example of how I'm naively approaching the problem:
def ffill_years(df):
df.reset_index(['A', 'B']) # drop all but 'year'
year_range = range(df['year'].min(), df['year'].max())
df.reindex(pd.Series(years)).fillna("ffill")
return df
df.groupby(level=['A', 'B']).apply(ffill_years)
Of course this doesn't work. Any and all tips appreciated!
Upvotes: 3
Views: 474
Reputation: 52236
You were pretty close - a couple small changes:
reset_index
doesn't operate in place.index
reindex
is also not in-placemethod
See below:
def ffill_years(df):
df = df.reset_index(['A','B']) # drop all but 'year'
year_range = range(df.index.min(), df.index.max() + 1)
df = df.reindex(pd.Series(year_range)).fillna(method='ffill')
return df
Results in
In [209]: df.groupby(level=['A','B']).apply(ffill_years)
Out[209]:
A B value
A B year
bar black 1991 bar black 0.01
1992 bar black 0.02
white 1990 bar white 4.56
1991 bar white 4.56
1992 bar white 6.79
foo black 1992 foo black 1.20
white 1990 foo white 3.14
Upvotes: 3