liv2hak
liv2hak

Reputation: 15010

explanation for groupby() in Pandas dataframe object behaviour

I have a csv file as shown below

Hour,L,Dr,Tag,Code,Vge
0,L5,XI,PS,4R,15
0,L3,St,sst,4R,17
5,L5,XI,PS,4R,12
2,L0,St,v2T,4R,11
8,L2,TI,sst,4R,8
12,L5,XI,PS,4R,18
2,L2,St,PS,4R,9
12,L3,XI,sst,4R,16

I execute the following script in my ipython notebook.

In[1]
    import pandas as pd
In[2]
    df = pd.read_csv('/python/concepts/pandas/in.csv')
In[3]    
    df.head(n=9)

Out[1]: 

       Hour   L  Dr  Tag Code  Vge
    0     0  L5  XI   PS   4R   15
    1     0  L3  St  sst   4R   17
    2     5  L5  XI   PS   4R   12
    3     2  L0  St  v2T   4R   11
    4     8  L2  TI  sst   4R    8
    5    12  L5  XI   PS   4R   18
    6     2  L2  St   PS   4R    9
    7    12  L3  XI  sst   4R   16

In[4]
    df.groupby(('Hour'))['Vge'].head(n=9)
Out[2]

    0    15
    1    17
    2    12
    3    11
    4     8
    5    18
    6     9
    7    16
    Name: Vge, dtype: int64

The output doesn't seem to be grouped by Hour.Rather it looks like it is output in the order of dataframe internal index.

I am trying to understand the groupby usage in Pandas dataframe.The usage hasn't cliked yet for me. It would be appreciated if someone could guide me.

Upvotes: 0

Views: 320

Answers (1)

Mike Müller
Mike Müller

Reputation: 85622

You need to do something with the groups. For example:

>>> df.groupby('Hour').sum()
      Vge
Hour     
0      32
2      20
5      12
8       8
12     34

or:

>>> df.groupby('Hour').count()['Vge']
Hour
0     2
2     2
5     1
8     1
12    2
Name: Vge, dtype: int64

Upvotes: 1

Related Questions