Leonardo Viotti
Leonardo Viotti

Reputation: 506

Pandas interpolation adding rows by group with different ranges for each group

I am trying to add rows to a DataFrame interpolating values in a column by group, and fill with missing all other columns. My data looks something like this:

import pandas as pd 
import random

random.seed(42)
data = {'group':['a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c' ],
        'value' : [1, 2, 5,  3, 4, 5, 7, 4, 7, 9],
        'other': random.sample(range(1, 100), 10)}

df = pd.DataFrame(data)
print(df)
  group  value  other
0     a      1     82
1     a      2     15
2     a      5      4
3     b      3     95
4     b      4     36
5     b      5     32
6     b      7     29
7     c      4     18
8     c      7     14
9     c      9     87

What I am trying to achieve is something like this:

   group  value  other
      a      1     82
      a      2     15
      a      3     NaN
      a      4     NaN
      a      5     NaN
      b      3     95
      b      4     36
      b      5     32
      b      6     NaN
      b      7     29
      c      4     18
      c      5     NaN
      c      6     NaN
      c      7     14
      c      8     NaN
      c      9     87

For example, group a has a range from 1 to 5, b from 3 to 7, and c from 4 to 9.

The issue I'm having is that each group has a different range. I found something that works assuming a single range for all groups. This could work using the global min and max and dropping extra rows in each group, but since my data is fairly large adding many rows per group quickly becomes unfeasible.

Upvotes: 3

Views: 210

Answers (2)

sammywemmy
sammywemmy

Reputation: 28699

One option is with the complete function from pyjanitor, which can be helpful in exposing explicitly missing rows (and can be helpful as well in abstracting the reshaping process):

# pip install pyjanitor
import pandas as pd
import janitor

new_value = {'value' : lambda df: range(df.min(), df.max()+1)}

# expose the missing values per group via the `by` parameter
df.complete(new_value, by='group', sort = True)

   group  value  other
0      a      1   82.0
1      a      2   15.0
2      a      3    NaN
3      a      4    NaN
4      a      5    4.0
5      b      3   95.0
6      b      4   36.0
7      b      5   32.0
8      b      6    NaN
9      b      7   29.0
10     c      4   18.0
11     c      5    NaN
12     c      6    NaN
13     c      7   14.0
14     c      8    NaN
15     c      9   87.0

Upvotes: 1

Asish M.
Asish M.

Reputation: 2647

>>> df.groupby('group').apply(lambda x: x.set_index('value').reindex(np.arange(x['value'].min(), x['value'].max() + 1))).drop(columns='group').reset_index()
   group  value  other
0      a      1   82.0
1      a      2   15.0
2      a      3    NaN
3      a      4    NaN
4      a      5    4.0
5      b      3   95.0
6      b      4   36.0
7      b      5   32.0
8      b      6    NaN
9      b      7   29.0
10     c      4   18.0
11     c      5    NaN
12     c      6    NaN
13     c      7   14.0
14     c      8    NaN
15     c      9   87.0

We group on the group column and then re-index each group with the range from the min to the max of the value column

Upvotes: 4

Related Questions