Rolfrider
Rolfrider

Reputation: 97

Pandas come back form groupBy

My goal is to take data and interpolate missing values based on specific column and specific type I achieved this goal but I have trouble to coming back to the shape of data frame before interpolation.

data = [
    {"type": "Car", "avg_speed": 30, "max_speed": 200},
    {"type": "Car", "avg_speed": 20, "max_speed": 100},
    {"type": "Car", "avg_speed": 25, "max_speed": None},
    {"type": "Plane", "avg_speed": 300, "max_speed": 2000},
    {"type": "Plane", "avg_speed": 200, "max_speed": 1000},
    {"type": "Plane", "avg_speed": 250, "max_speed": None}
]


df = pd.DataFrame(data)
print(df)
post_interp = df.groupby("type").apply(lambda x: x.set_index(
    'avg_speed').sort_index().interpolate(method='index'))
print(post_interp)

first print:

    type  avg_speed  max_speed
0    Car         30      200.0
1    Car         20      100.0
2    Car         25        NaN
3  Plane        300     2000.0
4  Plane        200     1000.0
5  Plane        250        NaN

second print:

                  type  max_speed
type  avg_speed
Car   20           Car      100.0
      25           Car      150.0
      30           Car      200.0
Plane 200        Plane     1000.0
      250        Plane     1500.0
      300        Plane     2000.0

I would like to return to shape of data frame in print 1 with interpolated values.

Upvotes: 0

Views: 174

Answers (1)

jezrael
jezrael

Reputation: 862681

Add group_keys=False to DataFrame.groupby for avoid duplicated index and last add DataFrame.reset_index:

post_interp = (df.groupby("type", group_keys=False)
                 .apply(lambda x: x.set_index('avg_speed')
                                   .sort_index()
                                   .interpolate(method='index'))
                 .reset_index())

Another solution with double reset_index:

post_interp = (df.groupby("type")
                 .apply(lambda x: x.set_index('avg_speed')
                                   .sort_index()
                                   .interpolate(method='index'))
                 .reset_index(level=0, drop=True)
                 .reset_index())

Or you can create index before groupby:

post_interp = (df.set_index('avg_speed')
                 .sort_index()
                 .groupby("type", group_keys=False)
                 .apply(lambda x: x.interpolate(method='index'))
                 .reset_index())
print(post_interp)
   avg_speed   type  max_speed
0         20    Car      100.0
1         25    Car      150.0
2         30    Car      200.0
3        200  Plane     1000.0
4        250  Plane     1500.0
5        300  Plane     2000.0

Last if necessary same order of columns add DataFrame.reindex:

post_interp = post_interp.reindex(df.columns, axis=1)
print(post_interp)
    type  avg_speed  max_speed
0    Car         20      100.0
1    Car         25      150.0
2    Car         30      200.0
3  Plane        200     1000.0
4  Plane        250     1500.0
5  Plane        300     2000.0

Upvotes: 2

Related Questions