Reputation: 675
I have a data that's as below. I need to impute the week number for each group as shown. I know the start week number for each group and the number of weeks in a year. I've tried using ffill()
function but that doesn't work in this situation. Is there any in-built function or an efficient way of doing this?
group year week week_imputed
A 2016 43 43
A 2016 44 44
A 2016 NaN 45
A 2016 NaN 46
A 2016 NaN 47
A 2016 48 48
A 2016 49 49
A 2016 50 50
A 2016 51 51
A 2016 52 52
A 2016 NaN 53
A 2017 NaN 1
A 2017 NaN 2
A 2017 NaN 3
A 2017 NaN 4
A 2017 5 5
A 2017 NaN 6
A 2017 7 7
A 2017 NaN 8
B 2016 47 47
B 2016 NaN 48
B 2016 NaN 49
B 2016 50 50
B 2016 51 51
B 2016 NaN 52
B 2017 NaN 1
B 2017 2 2
Upvotes: 1
Views: 576
Reputation: 35636
df['week_imputed'] = df.groupby([df.group, df.year]).week.fillna(method='ffill').fillna(value=1).astype(int) + \
df.groupby([df.group, df.year, df.week.notnull().cumsum()]).cumcount()
Similar answer to Pandas fillna with an incremented value except to populate the NaN
s with values initially I use ffill
, then go back through to those groups who start with NaN
and initialize them to 1s as this is your desired start value.
output:
group year week week_imputed
0 A 2016 43.0 43
1 A 2016 44.0 44
2 A 2016 NaN 45
3 A 2016 NaN 46
4 A 2016 NaN 47
5 A 2016 48.0 48
6 A 2016 49.0 49
7 A 2016 50.0 50
8 A 2016 51.0 51
9 A 2016 52.0 52
10 A 2016 NaN 53
11 A 2017 NaN 1
12 A 2017 NaN 2
13 A 2017 NaN 3
14 A 2017 NaN 4
15 A 2017 5.0 5
16 A 2017 NaN 6
17 A 2017 7.0 7
18 A 2017 NaN 8
19 B 2016 47.0 47
20 B 2016 NaN 48
21 B 2016 NaN 49
22 B 2016 50.0 50
23 B 2016 51.0 51
24 B 2016 NaN 52
25 B 2017 NaN 1
26 B 2017 2.0 2
Upvotes: 3