chitown88
chitown88

Reputation: 28620

Pandas interpolate between 2 given numbers, given step increase and within groupby

This is a loaded question as I can find how each of these are done individually, but want to put it all together, particularly applied by groups. The goal is to to use 2 values as a range and interpolate values between them given a set interval. Next I want a column with the running sum (however, I am familiar with that. It's the interpolate functions that I don't understand within a groupby).

As stated, I started with a basic and it works perfectly, ie:

df = pd.DataFrame({'minute':[1,3,4,5,8],'value':[1,4,7,10,13]})
max_value = df['minute'].max()

df.index = df.minute
df2 = pd.DataFrame({'minute':range(0,max_value), 'Value':0})
df2.index = df2.minute

df2.value = df.value
df2= df2.fillna(0)

But now given an additional column, how do I apply this to 'id' 'a' and id 'h'?

So given this dataframe:

df = pd.DataFrame([['a',    '0',    '10'],
                ['a',   '1',    '10'],
                ['h',   '2',    '15'],
                ['a',   '1',    '10'],
                ['h',   '3',    '20'],
                ['h',   '13',   '5']], columns = ['id','minute','value'])

I'd like to generate this output, which would be groupby the id column, interpolate by the minute column where the min is 0, the max is the max value in that column, and input a 0 in the value column.

Example Output:

id  minute  value   sum
a   0        10     10
a   1        20     30
a   2         0     30
a   3         0     30
a   4         0     30
a   5         0     30
a   6         0     30
a   7         0     30
a   8         0     30
a   9         0     30
a   10        0     30
a   11        0     30
a   12        0     30
a   13        0     30
h   0         0      0
h   1         0      0
h   2        15     15
h   3        20     35
h   4         0     35
h   5         0     35
h   6         0     35
h   7         0     35
h   8         0     35
h   9         0     35
h   10        0     35
h   11        0     35
h   12        0     35
h   13        5     40

Upvotes: 2

Views: 295

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

You could try creating a MultiIndex using the from_product method and then reindex with it. Then use groupby.cumsum to create your 'sum' column:

min_idx = np.arange(df['minute'].max() + 1)
m_idx = pd.MultiIndex.from_product([df['id'].unique(), min_idx], names=['id', 'minute'])

df_new = df.set_index(['id', 'minute']).reindex(m_idx, fill_value=0).reset_index()
df_new['sum'] = df_new.groupby('id')['value'].cumsum()
df_new

[Out]

   id  minute  value  sum
0   a       0     10   10
1   a       1     10   20
2   a       2      0   20
3   a       3      0   20
4   a       4      0   20
5   a       5     10   30
6   a       6      0   30
7   a       7      0   30
8   a       8      0   30
9   a       9      0   30
10  a      10      0   30
11  a      11      0   30
12  a      12      0   30
13  a      13      0   30
14  h       0      0    0
15  h       1      0    0
16  h       2     15   15
17  h       3     20   35
18  h       4      0   35
19  h       5      0   35
20  h       6      0   35
21  h       7      0   35
22  h       8      0   35
23  h       9      0   35
24  h      10      0   35
25  h      11      0   35
26  h      12      0   35
27  h      13      5   40

Upvotes: 1

Related Questions