Johnny Metz
Johnny Metz

Reputation: 5985

get list of unique months from pandas column

Let's say I have the following pandas date_range:

rng = pd.date_range('9/1/2017', '12/31/2017')

I want to get a list of the unique months. This is what I've come up with so far but there has to be a better way:

df = pd.DataFrame({'date': rng})
months = df.groupby(pd.Grouper(key='date', freq='M')).agg('sum').index.tolist()
formatted_m = [i.strftime('%m/%Y') for i in months]
# ['09/2017', '10/2017', '11/2017', '12/2017']

Note the dates will be stored in a DataFrame column or index.

Upvotes: 4

Views: 6710

Answers (4)

Michael  Xu
Michael Xu

Reputation: 198

I usually use this one and I think it's quite straightforward:

rng.month.unique()

Edit: Probably not relevant any longer, but just for the sake of completeness:

set([str(year)+str(month) for year , month in zip(rng.year,rng.month)])

Upvotes: 0

BENY
BENY

Reputation: 323316

Do not need to build the df

(rng.year*100+rng.month).value_counts().index.tolist()
Out[861]: [201712, 201710, 201711, 201709]

Updated :

set((rng.year*100+rng.month).tolist())
Out[865]: {201709, 201710, 201711, 201712}

Upvotes: 1

jezrael
jezrael

Reputation: 863056

Use numpy.unique because DatetmeIndex.strftime return numpy array:

rng = pd.date_range('9/1/2017', '12/31/2017')
print (np.unique(rng.strftime('%m/%Y')).tolist())
['09/2017', '10/2017', '11/2017', '12/2017']

If input is column of DataFrame use Anton vBR's solution:

print(df['date'].dt.strftime("%m/%y").unique().tolist())

Or drop_duplicates:

print(df['date'].dt.strftime("%m/%y").drop_duplicates().tolist())

Timings:

All solution have same performance - unique vs drop_duplicates:

rng = pd.date_range('9/1/1900', '12/31/2017')

df = pd.DataFrame({'date': rng})

In [54]: %timeit (df['date'].dt.strftime("%m/%y").unique().tolist())
1 loop, best of 3: 469 ms per loop

In [56]: %timeit (df['date'].dt.strftime("%m/%y").drop_duplicates().tolist())
1 loop, best of 3: 466 ms per loop

Upvotes: 9

Anton vBR
Anton vBR

Reputation: 18916

Yes or this:

df['date'].dt.strftime("%m/%y").unique().tolist()
#['09/17', '10/17', '11/17', '12/17']

Upvotes: 5

Related Questions