Reputation: 397
I have a dataset that looks something like this
testing = pd.DataFrame({'col':[1,np.nan,np.nan,7,1,np.nan,np.nan,7],
'col2':['01-MAY-17 15:47:00','01-MAY-17 15:57:00',
'07-MAY-17 15:47:00','07-MAY-17 22:07:00',
'01-MAY-17 15:47:00','01-MAY-17 15:57:00',
'07-MAY-17 15:47:00','07-MAY-17 22:07:00'],
'Customer_id':['A','A','A','A','B','B','B','B']})
I need to interpolate the missing values in the first column according to each customer (in this case, it wouldn't make a difference but since I have some customers that either their first or their last have missing values as missing, I really need to keep it separate).
Before, I was using this:
testing.groupby('Customer_id').apply(lambda group: group.interpolate(method= 'linear'))
but this assumes that every point is spaced equally, and as the second column is the date time of when each record was collected, it can be seen that it is not.
In order to change this in such a way that it would take the different spacing into account I passed col2 to the index, and interpolated with slinear
testing['col2'] = pd.to_datetime(testing['col2'])
testing['index1'] = testing.index
testing = testing.set_index('col2')
testing.apply(lambda group: group.interpolate(method= 'slinear'))
test_int=testing.interpolate(method='slinear')
test_int['col2'] = test_int.index
test_int = test_int.set_index('index1')
test_int
but this does not take into account the different customers. How could I do a group by for this situation?
Upvotes: 2
Views: 326
Reputation: 29635
IIUC, once you have set_index
the column with date, then you can use the method='index'
in the interpolate
per group such as:
testing.col2 = pd.to_datetime(testing.col2)
print (testing.set_index('col2').groupby('Customer_id')
.apply(lambda x: x.interpolate(method= 'index')).reset_index())
col2 col Customer_id
0 2017-05-01 15:47:00 1.000000 A
1 2017-05-01 15:57:00 1.006652 A
2 2017-05-07 15:47:00 6.747228 A
3 2017-05-07 22:07:00 7.000000 A
4 2017-05-01 15:47:00 1.000000 B
5 2017-05-01 15:57:00 1.006652 B
6 2017-05-07 15:47:00 6.747228 B
7 2017-05-07 22:07:00 7.000000 B
Upvotes: 2