Nocas
Nocas

Reputation: 397

Pandas Slinear Interpolation grouping by another column

I have a dataset that looks something like this

testing = pd.DataFrame({'col':[1,np.nan,np.nan,7,1,np.nan,np.nan,7], 
                        'col2':['01-MAY-17 15:47:00','01-MAY-17 15:57:00',
                            '07-MAY-17 15:47:00','07-MAY-17 22:07:00',
                            '01-MAY-17 15:47:00','01-MAY-17 15:57:00',
                            '07-MAY-17 15:47:00','07-MAY-17 22:07:00'],
                        'Customer_id':['A','A','A','A','B','B','B','B']})

I need to interpolate the missing values in the first column according to each customer (in this case, it wouldn't make a difference but since I have some customers that either their first or their last have missing values as missing, I really need to keep it separate).

Before, I was using this:

testing.groupby('Customer_id').apply(lambda group: group.interpolate(method= 'linear'))

but this assumes that every point is spaced equally, and as the second column is the date time of when each record was collected, it can be seen that it is not.

In order to change this in such a way that it would take the different spacing into account I passed col2 to the index, and interpolated with slinear

testing['col2'] = pd.to_datetime(testing['col2'])
testing['index1'] = testing.index
testing = testing.set_index('col2')
testing.apply(lambda group: group.interpolate(method= 'slinear'))
test_int=testing.interpolate(method='slinear')
test_int['col2'] = test_int.index
test_int = test_int.set_index('index1')
test_int

but this does not take into account the different customers. How could I do a group by for this situation?

Upvotes: 2

Views: 326

Answers (1)

Ben.T
Ben.T

Reputation: 29635

IIUC, once you have set_index the column with date, then you can use the method='index' in the interpolate per group such as:

testing.col2 = pd.to_datetime(testing.col2)
print (testing.set_index('col2').groupby('Customer_id')
              .apply(lambda x: x.interpolate(method= 'index')).reset_index())
                 col2       col Customer_id
0 2017-05-01 15:47:00  1.000000           A
1 2017-05-01 15:57:00  1.006652           A
2 2017-05-07 15:47:00  6.747228           A
3 2017-05-07 22:07:00  7.000000           A
4 2017-05-01 15:47:00  1.000000           B
5 2017-05-01 15:57:00  1.006652           B
6 2017-05-07 15:47:00  6.747228           B
7 2017-05-07 22:07:00  7.000000           B

Upvotes: 2

Related Questions