Reputation: 97
My goal is to take data and interpolate missing values based on specific column and specific type I achieved this goal but I have trouble to coming back to the shape of data frame before interpolation.
data = [
{"type": "Car", "avg_speed": 30, "max_speed": 200},
{"type": "Car", "avg_speed": 20, "max_speed": 100},
{"type": "Car", "avg_speed": 25, "max_speed": None},
{"type": "Plane", "avg_speed": 300, "max_speed": 2000},
{"type": "Plane", "avg_speed": 200, "max_speed": 1000},
{"type": "Plane", "avg_speed": 250, "max_speed": None}
]
df = pd.DataFrame(data)
print(df)
post_interp = df.groupby("type").apply(lambda x: x.set_index(
'avg_speed').sort_index().interpolate(method='index'))
print(post_interp)
first print:
type avg_speed max_speed
0 Car 30 200.0
1 Car 20 100.0
2 Car 25 NaN
3 Plane 300 2000.0
4 Plane 200 1000.0
5 Plane 250 NaN
second print:
type max_speed
type avg_speed
Car 20 Car 100.0
25 Car 150.0
30 Car 200.0
Plane 200 Plane 1000.0
250 Plane 1500.0
300 Plane 2000.0
I would like to return to shape of data frame in print 1 with interpolated values.
Upvotes: 0
Views: 174
Reputation: 862681
Add group_keys=False
to DataFrame.groupby
for avoid duplicated index and last add DataFrame.reset_index
:
post_interp = (df.groupby("type", group_keys=False)
.apply(lambda x: x.set_index('avg_speed')
.sort_index()
.interpolate(method='index'))
.reset_index())
Another solution with double reset_index
:
post_interp = (df.groupby("type")
.apply(lambda x: x.set_index('avg_speed')
.sort_index()
.interpolate(method='index'))
.reset_index(level=0, drop=True)
.reset_index())
Or you can create index before groupby
:
post_interp = (df.set_index('avg_speed')
.sort_index()
.groupby("type", group_keys=False)
.apply(lambda x: x.interpolate(method='index'))
.reset_index())
print(post_interp)
avg_speed type max_speed
0 20 Car 100.0
1 25 Car 150.0
2 30 Car 200.0
3 200 Plane 1000.0
4 250 Plane 1500.0
5 300 Plane 2000.0
Last if necessary same order of columns add DataFrame.reindex
:
post_interp = post_interp.reindex(df.columns, axis=1)
print(post_interp)
type avg_speed max_speed
0 Car 20 100.0
1 Car 25 150.0
2 Car 30 200.0
3 Plane 200 1000.0
4 Plane 250 1500.0
5 Plane 300 2000.0
Upvotes: 2