Pandas interpolate between 2 given numbers, given step increase and within groupby

Question

This is a loaded question as I can find how each of these are done individually, but want to put it all together, particularly applied by groups. The goal is to to use 2 values as a range and interpolate values between them given a set interval. Next I want a column with the running sum (however, I am familiar with that. It's the interpolate functions that I don't understand within a groupby).

As stated, I started with a basic and it works perfectly, ie:

df = pd.DataFrame({'minute':[1,3,4,5,8],'value':[1,4,7,10,13]})
max_value = df['minute'].max()

df.index = df.minute
df2 = pd.DataFrame({'minute':range(0,max_value), 'Value':0})
df2.index = df2.minute

df2.value = df.value
df2= df2.fillna(0)

But now given an additional column, how do I apply this to 'id' 'a' and id 'h'?

So given this dataframe:

df = pd.DataFrame([['a',    '0',    '10'],
                ['a',   '1',    '10'],
                ['h',   '2',    '15'],
                ['a',   '1',    '10'],
                ['h',   '3',    '20'],
                ['h',   '13',   '5']], columns = ['id','minute','value'])

I'd like to generate this output, which would be groupby the id column, interpolate by the minute column where the min is 0, the max is the max value in that column, and input a 0 in the value column.

Example Output:

id  minute  value   sum
a   0        10     10
a   1        20     30
a   2         0     30
a   3         0     30
a   4         0     30
a   5         0     30
a   6         0     30
a   7         0     30
a   8         0     30
a   9         0     30
a   10        0     30
a   11        0     30
a   12        0     30
a   13        0     30
h   0         0      0
h   1         0      0
h   2        15     15
h   3        20     35
h   4         0     35
h   5         0     35
h   6         0     35
h   7         0     35
h   8         0     35
h   9         0     35
h   10        0     35
h   11        0     35
h   12        0     35
h   13        5     40

Chris Adams · Accepted Answer

You could try creating a MultiIndex using the from_product method and then reindex with it. Then use groupby.cumsum to create your 'sum' column:

min_idx = np.arange(df['minute'].max() + 1)
m_idx = pd.MultiIndex.from_product([df['id'].unique(), min_idx], names=['id', 'minute'])

df_new = df.set_index(['id', 'minute']).reindex(m_idx, fill_value=0).reset_index()
df_new['sum'] = df_new.groupby('id')['value'].cumsum()
df_new

[Out]

   id  minute  value  sum
0   a       0     10   10
1   a       1     10   20
2   a       2      0   20
3   a       3      0   20
4   a       4      0   20
5   a       5     10   30
6   a       6      0   30
7   a       7      0   30
8   a       8      0   30
9   a       9      0   30
10  a      10      0   30
11  a      11      0   30
12  a      12      0   30
13  a      13      0   30
14  h       0      0    0
15  h       1      0    0
16  h       2     15   15
17  h       3     20   35
18  h       4      0   35
19  h       5      0   35
20  h       6      0   35
21  h       7      0   35
22  h       8      0   35
23  h       9      0   35
24  h      10      0   35
25  h      11      0   35
26  h      12      0   35
27  h      13      5   40

Pandas interpolate between 2 given numbers, given step increase and within groupby

Answers (1)

Related Questions