Roberto
Roberto

Reputation: 562

Pandas count rows between in two date columns

I need count the rows has between the columns date_from and date_to, example:

I have this DataFrame: date_from date_to

0    2017-07-01  2017-07-03
1    2017-07-01  2017-07-05
2    2017-07-02  2017-07-04
3    2017-07-03  2017-07-04

I need count how rows has between the columns date_from and date_to, example:

              count
date
2017-07-01    2
2017-07-02    3
2017-07-03    3
2017-07-04    1

I has trying with:

df.groupby(['date_from','date_to']).size()

but the pandas count a row once

EDIT:

I need count how many rows are between two dates, The dataframe that only have one row with this:

    date_from     date_to
0 2017-07-01    2017-07-03

have this output: 2017-07-01 1 2017-07-02 1

Upvotes: 0

Views: 3531

Answers (1)

jezrael
jezrael

Reputation: 862641

I think you need:


df['date_to'] = df['date_to'] - pd.to_timedelta(1, unit='d')
df = df.stack().rename_axis(('a','b')).reset_index(name='c').set_index('c')
df = df.groupby('a').resample('d').ffill().groupby('c').size().reset_index(name='a')
print (df)
           c  a
0 2017-07-01  2
1 2017-07-02  3
2 2017-07-03  3
3 2017-07-04  1

Similar solution:

df['date_to'] = df['date_to'] - pd.to_timedelta(1, unit='d')
df = df.stack().rename_axis(('a','b')).reset_index(name='c').set_index('c')
df = df.groupby('a').resample('d')['b'].size().reset_index()
#
df = df['c'].value_counts().sort_index().rename_axis('a').reset_index()
print (df)
           a  c
0 2017-07-01  2
1 2017-07-02  3
2 2017-07-03  3
3 2017-07-04  1

And another solution with itertuples:

df['date_to'] = df['date_to'] - pd.to_timedelta(1, unit='d')

df=pd.concat([pd.Series(r.Index,
                        pd.date_range(r.date_from, r.date_to)) for r in df.itertuples()])
      .reset_index()

df = df['index'].value_counts().sort_index().rename_axis('a').reset_index(name='c')
print (df)
           a  c
0 2017-07-01  2
1 2017-07-02  3
2 2017-07-03  3
3 2017-07-04  1

Upvotes: 2

Related Questions