stockade
stockade

Reputation: 277

pandas - how to get last n groups of a groupby object and combine them as a dataframe

How to get last 'n' groups after df.groupby() and combine them as a dataframe.

data = pd.read_sql_query(sql=sqlstr, con=sql_conn, index_col='SampleTime')
grouped = data.groupby(data.index.date,sort=False)

After doing grouped.ngroups i am getting total number of groups 277. I want to combine last 12 groups and generate a dataframe.

Upvotes: 13

Views: 3135

Answers (5)

Deepak Soni
Deepak Soni

Reputation: 29

Get last n keys and filter dataframe with those keys

n = 12
grouped = data.groupby(data.index.date)
keys = list(grouped.groups.keys())
last_n_groups = data.loc[data[data.index.date].isin(keys[-n:])]

Upvotes: 0

jpp
jpp

Reputation: 164673

Pandas GroupBy objects are iterables. To extract the last n elements of an iterable, there's generally no need to create a list from the iterable and slice the last n elements. This will be memory-expensive.

Instead, you can use either itertools.islice (as suggested by @mtraceur) or collections.deque. Both work in O(n) time.

itertools.islice

Unlike a generator, a Pandas GroupBy object is an iterable which can be reused. Therefore, you can calculate the number of groups via len(g) for a GroupBy object g and then slice g via islice. Or, perhaps more idiomatic, you can use GroupBy.ngroups. Then use pd.concat to concatenate an iterable of dataframes:

from operator import itemgetter

g = data.groupby(data.index.date, sort=False)
res = pd.concat(islice(map(itemgetter(1), g), max(0, g.ngroups-12), None))

collections.deque

Alternatively, you can use collections.deque and specify maxlen, then concatenate as before.

from collections import deque

grouped = data.groupby(data.index.date, sort=False)
res = pd.concat(deque(map(itemgetter(1), grouped), maxlen=12))

As described in the collections docs:

Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end.... They are also useful for tracking transactions and other pools of data where only the most recent activity is of interest.

Upvotes: 11

Yuca
Yuca

Reputation: 6091

use pd.concat on lists comprehension and groupby.get_group

pd.concat([grouped.get_group(x) for x in list(grouped.groups.keys())[-12:]])

Upvotes: 1

rahlf23
rahlf23

Reputation: 9019

You could pass a list comprehension to pd.concat():

import pandas as pd

df = pd.DataFrame([
['A',1,2],
['A',7,6],
['B',1,3],
['B',9,9],
['C',1,8],
['A',4,3],
['C',7,6],
['D',4,2]],
columns=['Var','Val1','Val2'])

last_n = 2
grouped = df.groupby('Var')

pd.concat([grouped.get_group(group) for i, group in enumerate(grouped.groups) if i>=len(grouped)-last_n])

Yields:

  Var  Val1  Val2
4   C     1     8
6   C     7     6
7   D     4     2

Upvotes: 0

rafaelc
rafaelc

Reputation: 59274

Assuming you know the order of grouped

grouped = zip(*df.groupby(data.index.date,sort=False))
pd.concat(list(grouped)[1][-12:])

Upvotes: 2

Related Questions