Reputation: 343
I have a pandas DataFrame with 3 columns. The first column contains string values in ascending order, at a certain frequency (e.g. '20173070000', '20173070020', '20173070040', etc.)
. The second and third columns contain corresponding integer values. I would like to re-sample the first column to every one - '20173070000', '20173070001', '20173070002', simultaneously filling the second and third columns with NaN
values, and then I would like to interpolate those NaN
values.
I've looked into re-sampling data, but this appears to only work for timedate
values. I have also looked into pd.interpolate, but this appears to work for interpolating between missing values. As stated above, my dataset does not contain missing data. I am simply looking to increase the frequency of my entries - to fill between existing values.
To give some reference, my current DataFrame looks like this:
0 1 2
0 20173070000 14.0 13.9
1 20173070020 14.1 14.1
2 20173070040 13.8 13.6
3 20173070060 13.7 13.7
4 20173070080 13.8 13.5
5 20173070100 13.9 14.0
I would like to generate a DataFrame that looks like:
0 1 2
0 20173070000 14.0 13.9
1 20173070001 NaN NaN
2 20173070002 NaN NaN
3 20173070003 NaN NaN
4 20173070004 NaN NaN
5 20173070005 NaN NaN
...
20 20173070020 14.1 14.1
21 20173070021 NaN NaN
...
I have no problem sorting the interpolation afterwards, but I have not worked out how to up sample yet.
Upvotes: 0
Views: 1657
Reputation: 1216
I believe the interpolate() is the way to go for you. After having upsampled as you described and given the column containing the values you want to interpolate is called 'val1', you can do:
df.loc[:, 'val1'] = df.loc[:, 'val1'].interpolate()
Upvotes: 0
Reputation: 915
You can just use reindex function. By default, it places NaN in locations having no value in the "new" index.
df = pd.DataFrame({'A': [20173070000, 20173070020, 20173070040, 20173070060, 20173070080, 20173070100 ],
'B': [14, 14.1, 13.8, 13.7, 13.8, 13.9],
'C': [13.9, 14.1, 13.6, 13.7, 13.5, 14.0] })
df.set_index('A').reindex(np.arange(np.min(df.A), np.max(df.A)+1) ).reset_index()
Upvotes: 10