i.n.n.m
i.n.n.m

Reputation: 3046

Can I use date index to create dummies in pandas?

I have been searching if I could create dummies using the date indexed in pandas, but could not find anything yet.

I have a df that is indexed by date

                        dew    temp   
date
2010-01-02 00:00:00      129.0  -16     
2010-01-02 01:00:00      148.0  -15     
2010-01-02 02:00:00      159.0  -11     
2010-01-02 03:00:00      181.0   -7      
2010-01-02 04:00:00      138.0   -7   
...  

I know I could setdate as a column using,

df.reset_index(level=0, inplace=True)

and then use something like this to create dummies,

df['main_hours'] = np.where((df['date'] >= '2010-01-02 03:00:00') & (df['date'] <= '2010-01-02 05:00:00')1,0)

However, I would like to create dummy variables using indexed date on the fly without using date as a column. Is there a way in pandas like that? Any suggestion would be appreciated.

Upvotes: 2

Views: 2076

Answers (3)

Alexander
Alexander

Reputation: 109696

df = df.assign(main_hours=0)
df.loc[df.between_time(start_time='3:00', end_time='5:00').index, 'main_hours'] = 1
>>> df
                     dew  temp  main_hours
2010-01-02 00:00:00  129   -16           0
2010-01-02 01:00:00  148   -15           0
2010-01-02 02:00:00  159   -11           0
2010-01-02 03:00:00  181    -7           1
2010-01-02 04:00:00  138    -7           1

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210952

IIUC:

df['main_hours'] = \
    np.where((df.index  >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'),
             1,
             0)

or:

In [8]: df['main_hours'] = \
            ((df.index >= '2010-01-02 03:00:00') & 
             (df.index <= '2010-01-02 05:00:00')).astype(int)

In [9]: df
Out[9]:
                       dew  temp  main_hours
date
2010-01-02 00:00:00  129.0   -16           0
2010-01-02 01:00:00  148.0   -15           0
2010-01-02 02:00:00  159.0   -11           0
2010-01-02 03:00:00  181.0    -7           1
2010-01-02 04:00:00  138.0    -7           1

Timing: for 50.000 rows DF:

In [19]: df = pd.concat([df.reset_index()] * 10**4, ignore_index=True).set_index('date')

In [20]: pd.options.display.max_rows = 10

In [21]: df
Out[21]:
                       dew  temp
date
2010-01-02 00:00:00  129.0   -16
2010-01-02 01:00:00  148.0   -15
2010-01-02 02:00:00  159.0   -11
2010-01-02 03:00:00  181.0    -7
2010-01-02 04:00:00  138.0    -7
...                    ...   ...
2010-01-02 00:00:00  129.0   -16
2010-01-02 01:00:00  148.0   -15
2010-01-02 02:00:00  159.0   -11
2010-01-02 03:00:00  181.0    -7
2010-01-02 04:00:00  138.0    -7

[50000 rows x 2 columns]

In [22]: %timeit ((df.index  >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00')).astype(int)
1.58 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [23]: %timeit np.where((df.index  >= '2010-01-02 03:00:00') & (df.index <= '2010-01-02 05:00:00'), 1, 0)
1.52 ms ± 28.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [24]: df.shape
Out[24]: (50000, 2)

Upvotes: 2

BENY
BENY

Reputation: 323366

Or using between;

pd.Series(df.index).between('2010-01-02 03:00:00',  '2010-01-02 05:00:00', inclusive=True).astype(int)

Out[1567]: 
0    0
1    0
2    0
3    1
4    1
Name: date, dtype: int32

Upvotes: 2

Related Questions