Python Pandas, setting groupby() group labels as index in a new dataframe

Question

I am a python programming beginner trying to figure out how a group label from groupby operation can be used as index of a new dataframe. For example,

df = pd.DataFrame({'Country': ['USA', 'USA', 'UK', 'China', 'Canada', 'Australia', 'UK', 'China', 'USA'],
            'Year': [1979, 1983, 1987, 1991, 1995, 1999, 2003, 2007, 2011],
            'Medals': [52, 30, 25, 41, 19, 17, 9, 14, 12]})

df:
         Country  Medals  Year
    0        USA      52  1979
    1        USA      30  1983
    2         UK      25  1987
    3      China      41  1991
    4     Canada      19  1995
    5  Australia      17  1999
    6         UK       9  2003
    7      China      14  2007
    8        USA      12  2011

 c1 = df.groupby(df['Country'], as_index=True, sort=False, group_keys=True).size()

c1:
Country
USA          3
UK           2
China        2
Canada       1
Australia    1

I want to create a new dataframe with the above c1 results exactly in that format but I have not been able to do that. Below is what I get:

d1 = pd.DataFrame(np.array(c1), columns=['Frequency'])
d1:
   Frequency
0          3
1          2
2          2
3          1
4          1

I want the group labels as index and not the default 0, 1, 2, 3 and 4. This is exactly what I want:

Desired Output:
            Frequency
USA             3
UK              2
China           2
Canada          1
Australia       1

Please how can I achieve this? I guess if I create a label with the countries and assign it as index, it might work. However, the original data I'm practising with has so many rows that it will be impossible for me to create a label list. Any ideas will be highly appreciated.

Josh Rumbut · Accepted Answer

Edit: let's see how you like this one!

c1 = pd.DataFrame(c1.values, index=c1.index.values, columns=['Frequency'])
print(c1)

    Frequency
USA         3
UK          2
China       2
Canada      1
Australia   1

c1.values is roughly equivalent (for our purposes) to np.array(c1) but avoids needing to import numpy.

Original response (doesn't quite work, left for posterity): You are likely looking for the set_index method.

It should work something like this:

c1 = df.groupby(df['Country'], as_index=True, sort=False, group_keys=True).size()

c2 = c1.set_index(['Country'])

Let me know if this works for you!

Python Pandas, setting groupby() group labels as index in a new dataframe

Answers (2)

Related Questions