Reputation: 634
I have this vector :
pd.Series([19.280, 48.380, 51.240, 58.603, 60.380, 203.300, ...])
And I want to introduce intermediate values equaly spaced in between each values that would be the closest to a increment step of 4.
This gives for the begining of the vector:
pd.Series([19.280, 23.437, 27.594, 31.751, 35.909, 40.066, 44.223, 48.380, 51.240, 54.921, 58.603, 60.380, ...])
Upvotes: 2
Views: 887
Reputation: 1283
Using pd.interpolate since your data is a pd series:
pd.interpolate
fills NaN values using interpolation between the adjacient numeric values
round
to get 'close to the increment step' with the required_number_of_incrementsCode:
import pandas as pd
import numpy as np
pds = pd.Series([19.280, 48.380, 51.240, 58.603, 60.380, 203.300], dtype='float64')
pds_filled = pd.Series(dtype='float64')
step_value = 4
for i in range(pds.size):
pds_filled = pd.concat([pds_filled, pd.Series(pds[i], dtype='float64')],
ignore_index = True)
# Note pd.append is deprecated
if i == len(pds)-1:
break # break after concat of the last element
no_inserts = round(((pds.shift(-1)[i] - pds[i])) / step_value ) - 1
# print(f"i= {i}, no_inserts= {no_inserts}")
for j in (range(0,no_inserts)): # not executed when no_insterts = 0
pds_filled = pd.concat([pds_filled, pd.Series(np.NaN, dtype='float64')],
ignore_index = True)
# print(pds_filled)
# print(pds_filled) # check the filled NaNs
pds_filled.interpolate(inplace=True)
# pd.interpolate() replaces NaNs with interpolated values
print(pds_filled) # final pd.series!
## print options
# print(pds_filled.tolist())
# print([f'{item:.3f}' for item in pds_filled.tolist()])
Result list:
['19.280', '23.437', '27.594', '31.751', '35.909', '40.066', '44.223', '48.380', '51.240', '54.922', '58.603', '60.380', '64.350', '68.320', '72.290', '76.260', '80.230', '84.200', '88.170', '92.140', '96.110', '100.080', '104.050', '108.020', '111.990', '115.960', '119.930', '123.900', '127.870', '131.840', '135.810', '139.780', '143.750', '147.720', '151.690', '155.660', '159.630', '163.600', '167.570', '171.540', '175.510', '179.480', '183.450', '187.420', '191.390', '195.360', '199.330', '203.300']
Notes:
no_insetrs
calculation sets the steps, you can fine tune that to adapt the step.Upvotes: 1
Reputation: 13242
Given:
s = pd.Series([19.280, 48.380, 51.240, 58.603, 60.380, 203.300,])
0 19.280
1 48.380
2 51.240
3 58.603
4 60.380
5 203.300
dtype: float64
Doing:
s.name = 'Value'
df = s.to_frame()
# Mark how many 4-length spaces could fit between the values.
# We'll round here, other methods are possible as well.
df['space'] = df.Value.diff().fillna(0).div(4).round().astype(int)
# Make these into lists of NaN of each length.
df['space'] = df['space'].apply(lambda x: [np.nan]*x)
# Explode these lists.
df = df.explode('space')
# Drop the helper column.
df = df.drop('space', axis=1)
# Make the duplicate values NaN.
df.loc[df.duplicated(keep='last'), 'Value'] = np.nan
# Reset the index and interpolate the values (linear is default)
df = df.reset_index(drop=True).interpolate('linear')
# Squeeze it back to a Series.
s = df.squeeze()
print(s)
Output:
0 19.280000
1 23.437143
2 27.594286
3 31.751429
4 35.908571
5 40.065714
6 44.222857
7 48.380000
8 51.240000
9 54.921500
10 58.603000
11 60.380000
12 64.350000
13 68.320000
14 72.290000
15 76.260000
16 80.230000
17 84.200000
18 88.170000
19 92.140000
20 96.110000
21 100.080000
22 104.050000
23 108.020000
24 111.990000
25 115.960000
26 119.930000
27 123.900000
28 127.870000
29 131.840000
30 135.810000
31 139.780000
32 143.750000
33 147.720000
34 151.690000
35 155.660000
36 159.630000
37 163.600000
38 167.570000
39 171.540000
40 175.510000
41 179.480000
42 183.450000
43 187.420000
44 191.390000
45 195.360000
46 199.330000
47 203.300000
Name: Value, dtype: float64
Upvotes: 2