Reputation: 189
I have a dataframe df1
date
sample
a1 2005-08-28
b1 2005-06-23
c1 2006-01-11
d1 ...
Ultimately, I want a dictionary of sets of samples, grouped by year. So something like
dict_y = {"2005": {a1, b2}, "2006": {c1}, ...}
I thought the best way to approach this would be by using pandas groupby, but I can't seem to get it work.
df2 = df1.reset_index()
df2 = df2.set_index([(df2["date"].dt.year)])
df3 = df2.groupby(df2.index.values)
But here df3 is not a dataframe neatly grouped by year, but just a "GroupBy object". What am I doing wrong here?
Upvotes: 2
Views: 54
Reputation: 29711
Another variant using GroupBy's .groups
attribute which returns a dictionary.
Convert the values of the dictionary from pd.Index
type to a set
later to extract unique elements out of it.
{k:set(v) for k,v in df.groupby(df['date'].dt.year).groups.items()}
Out[54]:
{2005: {'a1', 'b1'}, 2006: {'c1'}}
Upvotes: 1
Reputation: 862661
You can use groupby
by dt.year
and apply
lambda function where convert index
values to sets
. Last convert to_dict
:
df = pd.DataFrame({'date': [pd.Timestamp('2005-08-28 00:00:00'),
pd.Timestamp('2005-06-23 00:00:00'),
pd.Timestamp('2006-01-11 00:00:00')]}, index=['a1','b1','c1'])
print (df)
date
a1 2005-08-28
b1 2005-06-23
c1 2006-01-11
df = df.groupby(df.date.dt.year).apply(lambda x: set(x.index.values)).to_dict()
print (df)
{2005: {'a1', 'b1'}, 2006: {'c1'}}
Upvotes: 2