Antti Ellonen
Antti Ellonen

Reputation: 189

get sets of index values, grouped by column year

I have a dataframe df1

        date      
sample
a1      2005-08-28
b1      2005-06-23
c1      2006-01-11  
d1      ...  

Ultimately, I want a dictionary of sets of samples, grouped by year. So something like

dict_y = {"2005": {a1, b2}, "2006": {c1}, ...}

I thought the best way to approach this would be by using pandas groupby, but I can't seem to get it work.

df2 = df1.reset_index()
df2 = df2.set_index([(df2["date"].dt.year)])
df3 = df2.groupby(df2.index.values)

But here df3 is not a dataframe neatly grouped by year, but just a "GroupBy object". What am I doing wrong here?

Upvotes: 2

Views: 54

Answers (2)

Nickil Maveli
Nickil Maveli

Reputation: 29711

Another variant using GroupBy's .groups attribute which returns a dictionary.

Convert the values of the dictionary from pd.Index type to a set later to extract unique elements out of it.

{k:set(v) for k,v in df.groupby(df['date'].dt.year).groups.items()}
Out[54]:
{2005: {'a1', 'b1'}, 2006: {'c1'}}

Upvotes: 1

jezrael
jezrael

Reputation: 862661

You can use groupby by dt.year and apply lambda function where convert index values to sets. Last convert to_dict:

df = pd.DataFrame({'date': [pd.Timestamp('2005-08-28 00:00:00'), 
                            pd.Timestamp('2005-06-23 00:00:00'), 
                            pd.Timestamp('2006-01-11 00:00:00')]}, index=['a1','b1','c1'])
print (df)
         date
a1 2005-08-28
b1 2005-06-23
c1 2006-01-11

df = df.groupby(df.date.dt.year).apply(lambda x: set(x.index.values)).to_dict()
print (df)
{2005: {'a1', 'b1'}, 2006: {'c1'}}

Upvotes: 2

Related Questions