Okechukwu Ossai
Okechukwu Ossai

Reputation: 604

Python Pandas, setting groupby() group labels as index in a new dataframe

I am a python programming beginner trying to figure out how a group label from groupby operation can be used as index of a new dataframe. For example,

df = pd.DataFrame({'Country': ['USA', 'USA', 'UK', 'China', 'Canada', 'Australia', 'UK', 'China', 'USA'],
            'Year': [1979, 1983, 1987, 1991, 1995, 1999, 2003, 2007, 2011],
            'Medals': [52, 30, 25, 41, 19, 17, 9, 14, 12]})

df:
         Country  Medals  Year
    0        USA      52  1979
    1        USA      30  1983
    2         UK      25  1987
    3      China      41  1991
    4     Canada      19  1995
    5  Australia      17  1999
    6         UK       9  2003
    7      China      14  2007
    8        USA      12  2011

 c1 = df.groupby(df['Country'], as_index=True, sort=False, group_keys=True).size()

c1:
Country
USA          3
UK           2
China        2
Canada       1
Australia    1

I want to create a new dataframe with the above c1 results exactly in that format but I have not been able to do that. Below is what I get:

d1 = pd.DataFrame(np.array(c1), columns=['Frequency'])
d1:
   Frequency
0          3
1          2
2          2
3          1
4          1

I want the group labels as index and not the default 0, 1, 2, 3 and 4. This is exactly what I want:

Desired Output:
            Frequency
USA             3
UK              2
China           2
Canada          1
Australia       1

Please how can I achieve this? I guess if I create a label with the countries and assign it as index, it might work. However, the original data I'm practising with has so many rows that it will be impossible for me to create a label list. Any ideas will be highly appreciated.

Upvotes: 5

Views: 29395

Answers (2)

Josh Rumbut
Josh Rumbut

Reputation: 2710

Edit: let's see how you like this one!

c1 = pd.DataFrame(c1.values, index=c1.index.values, columns=['Frequency'])
print(c1)

    Frequency
USA         3
UK          2
China       2
Canada      1
Australia   1

c1.values is roughly equivalent (for our purposes) to np.array(c1) but avoids needing to import numpy.

Original response (doesn't quite work, left for posterity): You are likely looking for the set_index method.

It should work something like this:

c1 = df.groupby(df['Country'], as_index=True, sort=False, group_keys=True).size()

c2 = c1.set_index(['Country'])

Let me know if this works for you!

Upvotes: 2

Okechukwu Ossai
Okechukwu Ossai

Reputation: 604

Finally, I figured out what seems to be a working solution. I realized that c1 is a series and not a dataframe, with index which is callable by c1.index. So, I improved the code by specifying the index;

d1 = pd.DataFrame(np.array(c1), index=c1.index, columns=['Frequency'])

d1:

           Frequency
Country             
USA                3
UK                 2
China              2
Canada             1
Australia          1

I don't know if this is the best solution. Better ideas are still welcome.

Upvotes: 2

Related Questions