wuwucat
wuwucat

Reputation: 2621

how to groupby a DataFrame by index

Now I have a DataFrame "df" as below:

In [28]: df[:100]
Out[28]: 
       distkm     modlat     modlon  reallat  reallon         time
0    9.325590  42.423024 -70.512309  42.5040 -70.5419  731800.5514
1    9.286476  42.416112 -70.519175  42.4956 -70.5539  731800.6319
0    4.456535  42.423877 -70.408784  42.4292 -70.4626  731802.0660
1    6.393979  42.405980 -70.367245  42.4297 -70.4382  731802.1556
2    7.447289  42.389719 -70.343267  42.4259 -70.4196  731802.2312
0    4.456535  42.423877 -70.408784  42.4292 -70.4626  731802.0660
1    6.393979  42.405980 -70.367245  42.4297 -70.4382  731802.1556
2    7.447289  42.389719 -70.343267  42.4259 -70.4196  731802.2312
3    7.329755  42.370420 -70.340029  42.4134 -70.4077  731802.3208
4    6.817408  42.355624 -70.337595  42.3942 -70.4021  731802.3972
0     ...
1     ...

I want to seperate the DataFrame by "df.index" like:

     distkm     modlat     modlon  reallat  reallon         time
0    9.325590  42.423024 -70.512309  42.5040 -70.5419  731800.5514
1    9.286476  42.416112 -70.519175  42.4956 -70.5539  731800.6319
     distkm    modlat     modlon   reallat  reallon         time
0    4.456535  42.423877 -70.408784  42.4292 -70.4626  731802.0660
1    6.393979  42.405980 -70.367245  42.4297 -70.4382  731802.1556
2    7.447289  42.389719 -70.343267  42.4259 -70.4196  731802.2312
     distkm    modlat     modlon   reallat  reallon         time
0    4.456535  42.423877 -70.408784  42.4292 -70.4626  731802.0660
1    6.393979  42.405980 -70.367245  42.4297 -70.4382  731802.1556
2    7.447289  42.389719 -70.343267  42.4259 -70.4196  731802.2312
3    7.329755  42.370420 -70.340029  42.4134 -70.4077  731802.3208
4    6.817408  42.355624 -70.337595  42.3942 -70.4021  731802.3972

and then plot these small "df" as a figure. How can I approach that? I tried "groupby(df.index)" but the result isn't what I want, it just makes every same index number together.

Upvotes: 1

Views: 169

Answers (1)

DSM
DSM

Reputation: 353009

[migrated from the comments]

I don't know much about plotting, but ISTM you can use groupby the way you want [NB: this assumes your index consists of integers, not strings -- replace 0 by '0' if I'm wrong]:

>>> grouped = df.reset_index().groupby(((df.index == 0)*1).cumsum())
>>> for n,g in grouped:
...     print g
...     
   index    distkm     modlat     modlon  reallat  reallon         time
0      0  9.325590  42.423024 -70.512309  42.5040 -70.5419  731800.5514
1      1  9.286476  42.416112 -70.519175  42.4956 -70.5539  731800.6319
   index    distkm     modlat     modlon  reallat  reallon         time
2      0  4.456535  42.423877 -70.408784  42.4292 -70.4626  731802.0660
3      1  6.393979  42.405980 -70.367245  42.4297 -70.4382  731802.1556
4      2  7.447289  42.389719 -70.343267  42.4259 -70.4196  731802.2312
   index    distkm     modlat     modlon  reallat  reallon         time
5      0  4.456535  42.423877 -70.408784  42.4292 -70.4626  731802.0660
6      1  6.393979  42.405980 -70.367245  42.4297 -70.4382  731802.1556
7      2  7.447289  42.389719 -70.343267  42.4259 -70.4196  731802.2312
8      3  7.329755  42.370420 -70.340029  42.4134 -70.4077  731802.3208
9      4  6.817408  42.355624 -70.337595  42.3942 -70.4021  731802.3972

and for each group we can set the index again, e.g.:

>>> g.set_index("index")
         distkm     modlat     modlon  reallat  reallon         time
index                                                               
0      4.456535  42.423877 -70.408784  42.4292 -70.4626  731802.0660
1      6.393979  42.405980 -70.367245  42.4297 -70.4382  731802.1556
2      7.447289  42.389719 -70.343267  42.4259 -70.4196  731802.2312
3      7.329755  42.370420 -70.340029  42.4134 -70.4077  731802.3208
4      6.817408  42.355624 -70.337595  42.3942 -70.4021  731802.3972

Upvotes: 1

Related Questions