Reputation: 11

How to create rows in a pandas dataframe that are averages of other rows?

Take a dataframe like this one:

import pandas as pd
info = {'Year': [2010, 2010, 2010, 2010, 2015, 2015, 2015, 2015],
        'Country': ['USA', 'Mexico', 'Canada', 'China', 'USA', 'Mexico', 'Canada', 'China'],
        'AgeAvg': [40, 44, 45, 49, 45, 46, 50, 52],
        'HeightAvg': [68, 65, 67, 68, 69, 70, 64, 67]}
df = pd.DataFrame(data=info)
df

   Year Country  AgeAvg  HeightAvg
0  2010     USA      40         68
1  2010  Mexico      44         65
2  2010  Canada      45         67
3  2010   China      49         68
4  2015     USA      45         69
5  2015  Mexico      46         70
6  2015  Canada      50         64
7  2015   China      52         67

I want to add rows for 2011, 2012, 2013, and 2014. These rows will follow the same Countries, and have a smoothed average of the variables. For example, 2011 USA Age will be 41, 2012 USA age 42, 2013 USA age 43, 2014 USA age 44. This way the age will span from 2010 to 2015. I would also like to do this for all variables (like height in this case), not just age. Is there a way to do this in Python with Pandas?

Upvotes: 1

Answers (2)

Rob Raymond

Reputation: 31226

generate all the combinations for all years
merge() to have all rows
interpolate() for each country (groupby())

pd.DataFrame(
    {
        "Year": range(df["Year"].min(), df["Year"].max()+1),
        "Country": [df["Country"].unique() for y in range(df["Year"].min(), df["Year"].max()+1)],
    }
).explode("Country").merge(df, on=["Year", "Country"], how="outer").groupby(
    "Country"
).apply(
    lambda d: d.interpolate()
)

	Year	Country	AgeAvg	HeightAvg
0	2010	USA	40	68
1	2010	Mexico	44	65
2	2010	Canada	45	67
3	2010	China	49	68
4	2011	USA	41	68.2
5	2011	Mexico	44.4	66
6	2011	Canada	46	66.4
7	2011	China	49.6	67.8
8	2012	USA	42	68.4
9	2012	Mexico	44.8	67
10	2012	Canada	47	65.8
11	2012	China	50.2	67.6
12	2013	USA	43	68.6
13	2013	Mexico	45.2	68
14	2013	Canada	48	65.2
15	2013	China	50.8	67.4
16	2014	USA	44	68.8
17	2014	Mexico	45.6	69
18	2014	Canada	49	64.6
19	2014	China	51.4	67.2
20	2015	USA	45	69
21	2015	Mexico	46	70
22	2015	Canada	50	64
23	2015	China	52	67

Upvotes: 0

Corralien

Reputation: 120509

Use pd.MultiIndex.from_product to reindex your dataframe and interpolate values:

mi = pd.MultiIndex.from_product([df['Country'].unique(),
                                 range(df.Year.min(), df.Year.max()+1)])

out = df.set_index(['Country', 'Year']).reindex(mi)
out = out.groupby(level=0).apply(lambda x: x.interpolate())

>>> out
             AgeAvg  HeightAvg
USA    2010    40.0       68.0
       2011    41.0       68.2
       2012    42.0       68.4
       2013    43.0       68.6
       2014    44.0       68.8
       2015    45.0       69.0
Mexico 2010    44.0       65.0
       2011    44.4       66.0
       2012    44.8       67.0
       2013    45.2       68.0
       2014    45.6       69.0
       2015    46.0       70.0
Canada 2010    45.0       67.0
       2011    46.0       66.4
       2012    47.0       65.8
       2013    48.0       65.2
       2014    49.0       64.6
       2015    50.0       64.0
China  2010    49.0       68.0
       2011    49.6       67.8
       2012    50.2       67.6
       2013    50.8       67.4
       2014    51.4       67.2
       2015    52.0       67.0

You can swap levels if you prefer Year first.

out = out.swaplevel().sort_index()

Upvotes: 1

How to create rows in a pandas dataframe that are averages of other rows?

Answers (2)

Related Questions