Reputation: 73
I have a dataframe called df that has a column named date, that has many dates not unique. I want to create another dataframe with 2 columns : 1 called date and the other called count, for each date I want to know how many times it got repeated.
I used the method value_counts like this date_count=df['date'].value_counts()
, this returns a Series object with the data as index? and the value as the only column (I didn't really understand this part and how the date got returned as the index).
I then used date_count=date_count.to_frame()
but still same problem.
I chose using value_counts() because it does the job as it removes duplicates and gives me the number of repetitions.
My questions are :1- How can I achieve getting a dataframe with date on a column and it's count value on second column?
2-Why does the date become an index on value_counts()
method?
Upvotes: 0
Views: 115
Reputation: 34056
Something like this would work with value_counts()
:
In [446]: df2
Out[446]:
dates
0 20-03-2020
1 21-03-2020
2 20-03-2020
3 22-03-2020
4 20-03-2020
In [451]: df = df2.dates.value_counts().reset_index().rename(columns={'index': 'dates', 'dates': 'counts'})
In [452]: df
Out[452]:
dates counts
0 20-03-2020 3
1 22-03-2020 1
2 21-03-2020 1
Upvotes: 1