user3437212
user3437212

Reputation: 675

How can I fill missing data sequentially in pandas?

I have a data that's as below. I need to impute the week number for each group as shown. I know the start week number for each group and the number of weeks in a year. I've tried using ffill() function but that doesn't work in this situation. Is there any in-built function or an efficient way of doing this?

group   year    week    week_imputed
A      2016     43          43
A   2016        44          44
A   2016       NaN          45
A   2016       NaN          46
A   2016       NaN          47
A   2016       48           48
A   2016       49           49
A   2016       50           50
A   2016       51           51
A   2016       52           52
A   2016       NaN          53
A   2017       NaN          1
A   2017       NaN          2
A   2017       NaN          3
A   2017       NaN          4
A   2017        5           5
A   2017       NaN          6
A   2017        7           7
A   2017       NaN          8
B   2016       47          47
B   2016       NaN         48
B   2016       NaN         49
B   2016       50          50
B   2016       51          51
B   2016       NaN         52
B   2017       NaN         1
B   2017       2           2

Upvotes: 1

Views: 576

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35636

df['week_imputed'] = df.groupby([df.group, df.year]).week.fillna(method='ffill').fillna(value=1).astype(int) + \
                     df.groupby([df.group, df.year, df.week.notnull().cumsum()]).cumcount()

Similar answer to Pandas fillna with an incremented value except to populate the NaNs with values initially I use ffill, then go back through to those groups who start with NaN and initialize them to 1s as this is your desired start value.

output:

   group  year  week  week_imputed
0      A  2016  43.0            43
1      A  2016  44.0            44
2      A  2016   NaN            45
3      A  2016   NaN            46
4      A  2016   NaN            47
5      A  2016  48.0            48
6      A  2016  49.0            49
7      A  2016  50.0            50
8      A  2016  51.0            51
9      A  2016  52.0            52
10     A  2016   NaN            53
11     A  2017   NaN             1
12     A  2017   NaN             2
13     A  2017   NaN             3
14     A  2017   NaN             4
15     A  2017   5.0             5
16     A  2017   NaN             6
17     A  2017   7.0             7
18     A  2017   NaN             8
19     B  2016  47.0            47
20     B  2016   NaN            48
21     B  2016   NaN            49
22     B  2016  50.0            50
23     B  2016  51.0            51
24     B  2016   NaN            52
25     B  2017   NaN             1
26     B  2017   2.0             2

Upvotes: 3

Related Questions