Reputation: 562
I need count the rows has between the columns date_from and date_to, example:
I have this DataFrame: date_from date_to
0 2017-07-01 2017-07-03
1 2017-07-01 2017-07-05
2 2017-07-02 2017-07-04
3 2017-07-03 2017-07-04
I need count how rows has between the columns date_from and date_to, example:
count
date
2017-07-01 2
2017-07-02 3
2017-07-03 3
2017-07-04 1
I has trying with:
df.groupby(['date_from','date_to']).size()
but the pandas count a row once
EDIT:
I need count how many rows are between two dates, The dataframe that only have one row with this:
date_from date_to
0 2017-07-01 2017-07-03
have this output: 2017-07-01 1 2017-07-02 1
Upvotes: 0
Views: 3531
Reputation: 862641
I think you need:
date_to
stack
and create DatetimeIndex
by set_index
groupby
and resample
by day
s and aggregate by ffill
or count
groupby
+ size
or value_counts
df['date_to'] = df['date_to'] - pd.to_timedelta(1, unit='d')
df = df.stack().rename_axis(('a','b')).reset_index(name='c').set_index('c')
df = df.groupby('a').resample('d').ffill().groupby('c').size().reset_index(name='a')
print (df)
c a
0 2017-07-01 2
1 2017-07-02 3
2 2017-07-03 3
3 2017-07-04 1
Similar solution:
df['date_to'] = df['date_to'] - pd.to_timedelta(1, unit='d')
df = df.stack().rename_axis(('a','b')).reset_index(name='c').set_index('c')
df = df.groupby('a').resample('d')['b'].size().reset_index()
#
df = df['c'].value_counts().sort_index().rename_axis('a').reset_index()
print (df)
a c
0 2017-07-01 2
1 2017-07-02 3
2 2017-07-03 3
3 2017-07-04 1
And another solution with itertuples
:
df['date_to'] = df['date_to'] - pd.to_timedelta(1, unit='d')
df=pd.concat([pd.Series(r.Index,
pd.date_range(r.date_from, r.date_to)) for r in df.itertuples()])
.reset_index()
df = df['index'].value_counts().sort_index().rename_axis('a').reset_index(name='c')
print (df)
a c
0 2017-07-01 2
1 2017-07-02 3
2 2017-07-03 3
3 2017-07-04 1
Upvotes: 2