hta
hta

Reputation: 93

Indexing/Binning Time Series

I have a dataframe like bellow

    ID    Date
    111   1.1.2018
    222   5.1.2018
    333   7.1.2018
    444   8.1.2018
    555   9.1.2018
    666  13.1.2018

and I would like to bin them into 5 days intervals. The output should be

ID    Date       Bin
111   1.1.2018    1
222   5.1.2018    1
333   7.1.2018    2
444   8.1.2018    2
555   9.1.2018    2
666  13.1.2018    3

How can I do this in python, please?

Upvotes: 1

Views: 126

Answers (2)

jpp
jpp

Reputation: 164623

One way is to create an array of your date range and use numpy.digitize.

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

date_ranges = pd.date_range(df['Date'].min(), df['Date'].max(), freq='5D')\
                .astype(np.int64).values

df['Bin'] = np.digitize(df['Date'].astype(np.int64).values, date_ranges)

Result:

    ID       Date  Bin
0  111 2018-01-01    1
1  222 2018-01-05    1
2  333 2018-01-07    2
3  444 2018-01-08    2
4  555 2018-01-09    2
5  666 2018-01-13    3

Upvotes: 0

cs95
cs95

Reputation: 402303

Looks like groupby + ngroup does it:

df['Date'] = pd.to_datetime(df.Date, errors='coerce', dayfirst=True)
df['Bin'] = df.groupby(pd.Grouper(freq='5D', key='Date')).ngroup() + 1

df
    ID       Date  Bin
0  111 2018-01-01    1
1  222 2018-01-05    1
2  333 2018-01-07    2
3  444 2018-01-08    2
4  555 2018-01-09    2
5  666 2018-01-13    3

If you don't want to mutate the Date column, then you may first call assign for a copy based assignment, and then do the groupby:

df['Bin'] = df.assign(
       Date=pd.to_datetime(df.Date, errors='coerce', dayfirst=True)
   ).groupby(pd.Grouper(freq='5D', key='Date')).ngroup() + 1

df
    ID       Date  Bin
0  111   1.1.2018    1
1  222   5.1.2018    1
2  333   7.1.2018    2
3  444   8.1.2018    2
4  555   9.1.2018    2
5  666  13.1.2018    3

Upvotes: 2

Related Questions