Reputation: 277
How to get last 'n' groups after df.groupby()
and combine them as a dataframe.
data = pd.read_sql_query(sql=sqlstr, con=sql_conn, index_col='SampleTime')
grouped = data.groupby(data.index.date,sort=False)
After doing grouped.ngroups
i am getting total number of groups 277. I want to combine last 12 groups and generate a dataframe.
Upvotes: 13
Views: 3135
Reputation: 29
Get last n keys and filter dataframe with those keys
n = 12
grouped = data.groupby(data.index.date)
keys = list(grouped.groups.keys())
last_n_groups = data.loc[data[data.index.date].isin(keys[-n:])]
Upvotes: 0
Reputation: 164673
Pandas GroupBy
objects are iterables. To extract the last n elements of an iterable, there's generally no need to create a list from the iterable and slice the last n elements. This will be memory-expensive.
Instead, you can use either itertools.islice
(as suggested by @mtraceur) or collections.deque
. Both work in O(n) time.
itertools.islice
Unlike a generator, a Pandas GroupBy
object is an iterable which can be reused. Therefore, you can calculate the number of groups via len(g)
for a GroupBy
object g
and then slice g
via islice
. Or, perhaps more idiomatic, you can use GroupBy.ngroups
. Then use pd.concat
to concatenate an iterable of dataframes:
from operator import itemgetter
g = data.groupby(data.index.date, sort=False)
res = pd.concat(islice(map(itemgetter(1), g), max(0, g.ngroups-12), None))
collections.deque
Alternatively, you can use collections.deque
and specify maxlen
, then concatenate as before.
from collections import deque
grouped = data.groupby(data.index.date, sort=False)
res = pd.concat(deque(map(itemgetter(1), grouped), maxlen=12))
As described in the collections
docs:
Once a bounded length
deque
is full, when new items are added, a corresponding number of items are discarded from the opposite end.... They are also useful for tracking transactions and other pools of data where only the most recent activity is of interest.
Upvotes: 11
Reputation: 6091
use pd.concat
on lists comprehension and groupby.get_group
pd.concat([grouped.get_group(x) for x in list(grouped.groups.keys())[-12:]])
Upvotes: 1
Reputation: 9019
You could pass a list comprehension to pd.concat()
:
import pandas as pd
df = pd.DataFrame([
['A',1,2],
['A',7,6],
['B',1,3],
['B',9,9],
['C',1,8],
['A',4,3],
['C',7,6],
['D',4,2]],
columns=['Var','Val1','Val2'])
last_n = 2
grouped = df.groupby('Var')
pd.concat([grouped.get_group(group) for i, group in enumerate(grouped.groups) if i>=len(grouped)-last_n])
Yields:
Var Val1 Val2
4 C 1 8
6 C 7 6
7 D 4 2
Upvotes: 0
Reputation: 59274
Assuming you know the order of grouped
grouped = zip(*df.groupby(data.index.date,sort=False))
pd.concat(list(grouped)[1][-12:])
Upvotes: 2