HappyDuppy
HappyDuppy

Reputation: 47

Easier way to fill the missing fields csv using python Pandas

I used the groupby method from pandas that can group by id and time in this example csv for example:

| id | month | average tree growth (cm)|
|----|-------|-------------------------|
|  1 |   4   |        9                |
|  1 |   5   |        4                |
|  1 |   6   |        7                |
|  2 |   1   |        9                |
|  2 |   2   |        9                |
|  2 |   3   |        8                |
|  2 |   4   |        6                |

However, each id should have 12 months and I will need to fill in the average tree height at that missing month to be null value, like this:

| id | month | average tree growth (cm)|
|----|-------|-------------------------|
|  1 |   1   |        nan              |
|  1 |   2   |        nan              |
|  1 |   3   |        nan              |
|  1 |   4   |        9                |
|  1 |   5   |        4                |
|  1 |   6   |        7                |
|  1 |   7   |        nan              |
|  1 |   8   |        nan              |
|  1 |   9   |        nan              |
|  1 |   10  |        nan              |
|  1 |   11  |        nan              |
|  1 |   12  |        nan              |
|  2 |   1   |        9                |

This is for bokeh plotting purpose, how do I add the missing month to each id and fill the average height to nan in this case using python? Is there any easier way than brute force looping all id and check for months? Any hint would be appreciated!

Upvotes: 1

Views: 105

Answers (2)

SeaBean
SeaBean

Reputation: 23227

One way to do it is by creating MultiIndex and reindex by using pd.MultiIndex.from_product and .reindex(), as follows:

mux = pd.MultiIndex.from_product([df['id'].unique(), np.arange(1, 13)],
                                 names=['id', 'month'])

df.set_index(['id', 'month']).reindex(mux).reset_index()

Result:

    id  month  average tree growth (cm)
0    1      1                       NaN
1    1      2                       NaN
2    1      3                       NaN
3    1      4                       9.0
4    1      5                       4.0
5    1      6                       7.0
6    1      7                       NaN
7    1      8                       NaN
8    1      9                       NaN
9    1     10                       NaN
10   1     11                       NaN
11   1     12                       NaN
12   2      1                       9.0
13   2      2                       9.0
14   2      3                       8.0
15   2      4                       6.0
16   2      5                       NaN
17   2      6                       NaN
18   2      7                       NaN
19   2      8                       NaN
20   2      9                       NaN
21   2     10                       NaN
22   2     11                       NaN
23   2     12                       NaN

Upvotes: 2

One possible solution is the following:

(df.groupby('id')['month']
   .apply(lambda x:np.arange(1, 13))
   .explode()
   .reset_index()
   .merge(df, how='left')
   
)

which produces:

id month  average tree growth (cm)
0    1     1                       NaN
1    1     2                       NaN
2    1     3                       NaN
3    1     4                       9.0
4    1     5                       4.0
5    1     6                       7.0
6    1     7                       NaN
7    1     8                       NaN
8    1     9                       NaN
9    1    10                       NaN
10   1    11                       NaN
11   1    12                       NaN
12   2     1                       9.0
13   2     2                       9.0
14   2     3                       8.0
15   2     4                       6.0
16   2     5                       NaN
17   2     6                       NaN
18   2     7                       NaN
19   2     8                       NaN
20   2     9                       NaN
21   2    10                       NaN
22   2    11                       NaN
23   2    12                       NaN
​

Upvotes: 1

Related Questions