Reputation: 47
I used the groupby method from pandas that can group by id and time in this example csv for example:
| id | month | average tree growth (cm)|
|----|-------|-------------------------|
| 1 | 4 | 9 |
| 1 | 5 | 4 |
| 1 | 6 | 7 |
| 2 | 1 | 9 |
| 2 | 2 | 9 |
| 2 | 3 | 8 |
| 2 | 4 | 6 |
However, each id should have 12 months and I will need to fill in the average tree height at that missing month to be null value, like this:
| id | month | average tree growth (cm)|
|----|-------|-------------------------|
| 1 | 1 | nan |
| 1 | 2 | nan |
| 1 | 3 | nan |
| 1 | 4 | 9 |
| 1 | 5 | 4 |
| 1 | 6 | 7 |
| 1 | 7 | nan |
| 1 | 8 | nan |
| 1 | 9 | nan |
| 1 | 10 | nan |
| 1 | 11 | nan |
| 1 | 12 | nan |
| 2 | 1 | 9 |
This is for bokeh plotting purpose, how do I add the missing month to each id and fill the average height to nan in this case using python? Is there any easier way than brute force looping all id and check for months? Any hint would be appreciated!
Upvotes: 1
Views: 105
Reputation: 23227
One way to do it is by creating MultiIndex
and reindex by using pd.MultiIndex.from_product
and .reindex()
, as follows:
mux = pd.MultiIndex.from_product([df['id'].unique(), np.arange(1, 13)],
names=['id', 'month'])
df.set_index(['id', 'month']).reindex(mux).reset_index()
Result:
id month average tree growth (cm)
0 1 1 NaN
1 1 2 NaN
2 1 3 NaN
3 1 4 9.0
4 1 5 4.0
5 1 6 7.0
6 1 7 NaN
7 1 8 NaN
8 1 9 NaN
9 1 10 NaN
10 1 11 NaN
11 1 12 NaN
12 2 1 9.0
13 2 2 9.0
14 2 3 8.0
15 2 4 6.0
16 2 5 NaN
17 2 6 NaN
18 2 7 NaN
19 2 8 NaN
20 2 9 NaN
21 2 10 NaN
22 2 11 NaN
23 2 12 NaN
Upvotes: 2
Reputation: 11532
One possible solution is the following:
(df.groupby('id')['month']
.apply(lambda x:np.arange(1, 13))
.explode()
.reset_index()
.merge(df, how='left')
)
which produces:
id month average tree growth (cm)
0 1 1 NaN
1 1 2 NaN
2 1 3 NaN
3 1 4 9.0
4 1 5 4.0
5 1 6 7.0
6 1 7 NaN
7 1 8 NaN
8 1 9 NaN
9 1 10 NaN
10 1 11 NaN
11 1 12 NaN
12 2 1 9.0
13 2 2 9.0
14 2 3 8.0
15 2 4 6.0
16 2 5 NaN
17 2 6 NaN
18 2 7 NaN
19 2 8 NaN
20 2 9 NaN
21 2 10 NaN
22 2 11 NaN
23 2 12 NaN
Upvotes: 1