Reputation: 11
I could like to count the unique occurrences of names per day from two columns:
df = pd.DataFrame({
'ColA':['john wick','bloody mary','peter pan','jeff bridges','billy boy'],
'ColB':['bloody mary','jeff bridges','billy boy','billy boy','john wick'],
'date':['2000-01-01', '2000-01-01', '2000-01-03', '2000-01-03', '2000-01-03'],})
datetime_series = pd.to_datetime(df['date'])
datetime_index = pd.DatetimeIndex(datetime_series.values)
df2 = df.set_index(datetime_index)
df2.drop('date',axis=1,inplace=True)
df2
Out[746]:
ColA ColB
2000-01-01 john wick bloody mary
2000-01-01 bloody mary jeff bridges
2000-01-03 peter pan billy boy
2000-01-03 jeff bridges billy boy
2000-01-03 billy boy john wick
So that I obtain a series or similar to the following:
unique occurrences of names
2000-01-01 3
2000-01-03 4
Upvotes: 1
Views: 654
Reputation: 862671
Use DataFrame.stack
with DataFrameGroupBy.nunique
and last Series.to_frame
:
df3 = df2.stack().groupby(level=0).nunique().to_frame(name='unique occurrences of names')
print (df3)
unique occurrences of names
2000-01-01 3
2000-01-03 4
Or alternative with DataFrame.melt
:
df3 = (df2.reset_index()
.melt('index')
.groupby('index')['value']
.nunique()
.to_frame(name='unique occurrences of names'))
Upvotes: 1