Reputation: 719
I have a DF like so:
Name Food Year_eaten Month_eaten
Maria Rice 2014 3
Maria Rice 2015 NaN
Maria Rice 2016 NaN
Jack Steak 2011 NaN
Jack Steak 2012 5
Jack Steak 2013 NaN
I want the output to look like this:
Name Food Year_eaten Month_eaten
Maria Rice 2014 3
Maria Rice 2015 3
Maria Rice 2016 3
Jack Steak 2011 5
Jack Steak 2012 5
Jack Steak 2013 5
I want to fill in the NaN's according to this condition:
If the row's Name, Food is the same and the Year's are consecutive:
Fill the NaN's with the Month_eaten corresponding to the row that isn't a NaN
There will be a person that has all NaN's for the Month_eaten, but I don't need to worry about that for now. Only the one's with at least one value for the Month_eaten in any of the years.
Any thoughts would be appreciated!
Upvotes: 5
Views: 65
Reputation: 402493
You can group on "Name", "Food", and a custom column created by diff
ing the rows of "Year_eaten".
u = df.Year_eaten.diff().bfill().ne(1).cumsum()
v = df.groupby(['Name','Food', v]).Month_eaten.transform('first')
df['Month_eaten'] = df.Month_eaten.fillna(v, downcast='infer')
df
Name Food Year_eaten Month_eaten
0 Maria Rice 2014 3
1 Maria Rice 2015 3
2 Maria Rice 2016 3
3 Jack Steak 2011 5
4 Jack Steak 2012 5
5 Jack Steak 2013 5
Another solution, if no group has all rows with NaN, is using groupby
and ffill
(everything else is the same).
df['Month_eaten'] = df.groupby(['Name','Food', u]).Month_eaten.ffill().bfill()
df
Name Food Year_eaten Month_eaten
0 Maria Rice 2014 3
1 Maria Rice 2015 3
2 Maria Rice 2016 3
3 Jack Steak 2011 5
4 Jack Steak 2012 5
5 Jack Steak 2013 5
Upvotes: 3
Reputation: 323226
Using diff().ne(1).cumsum()
create the continue year group key
continueyear=df.groupby(['Name','Food']).Year_eaten.apply(lambda x : x.diff().ne(1).cumsum())
Then using groupby
with apply
ffill
and bfill
df.groupby([df.Name,df.Food,continueyear]).Month_eaten.apply(lambda x : x.ffill().bfill().astype(int))
Out[26]:
0 3
1 3
2 3
3 5
4 5
5 5
Name: Month_eaten, dtype: int32
Upvotes: 1