Reputation: 81
New to Python and coding in general here so this should be pretty basic for most of you.
I basically created this dataframe with a Datetime index.
Here's the dataframe
df = pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')
I would now like to add a new variable to my df called "vacation" with a value of 1 if the date is between 2018-06-24 and 2018-08-24 and value of 0 if it's not between those dates. How can I go about doing this? I've created a variable with a range of vacation but I'm not sure how to put these two together along with creating a new column for "vacation" in my dataframe.
vacation = pd.date_range(start = '2018-06-24', end='2018-08-24')
Thanks in advance.
Upvotes: 1
Views: 2598
Reputation: 862406
Solution for new DataFrame
:
i = pd.date_range(start='2018-01-01', end='2018-08-26', freq='D')
m = (i > '2018-06-24') & (i < '2018-08-24')
df = pd.DataFrame({'vacation': m.astype(int)}, index=i)
Or:
df = pd.DataFrame({'vacation':np.where(m, 1, 0)}, index=i)
print (df)
vacation
2018-01-01 0
2018-01-02 0
2018-01-03 0
2018-01-04 0
2018-01-05 0
...
2018-08-22 1
2018-08-23 1
2018-08-24 0
2018-08-25 0
2018-08-26 0
[238 rows x 1 columns]
Solution for add new column to existing DataFrame
:
Create mask by compare DatetimeIndex
with chaining by &
for bitwise AND
and convert it to integer (True
to 1
and False
to 0
) or use numpy.where
:
i = pd.date_range(start='2018-01-01', end='2018-08-26', freq='D')
df = pd.DataFrame({'a': 1}, index=i)
m = (df.index > '2018-06-24') & (df.index < '2018-08-24')
df['vacation'] = m.astype(int)
#alternative
#df['vacation'] = np.where(m, 1, 0)
print (df)
a vacation
2018-01-01 1 0
2018-01-02 1 0
2018-01-03 1 0
2018-01-04 1 0
2018-01-05 1 0
.. ...
2018-08-22 1 1
2018-08-23 1 1
2018-08-24 1 0
2018-08-25 1 0
2018-08-26 1 0
[238 rows x 2 columns]
Another solution with DatetimeIndex
and DataFrame.loc
- difference is 1
included 2018-06-24
and 2018-08-24
edge values:
df['vacation'] = 0
df.loc['2018-06-24':'2018-08-24'] = 1
print (df)
a vacation
2018-01-01 1 0
2018-01-02 1 0
2018-01-03 1 0
2018-01-04 1 0
2018-01-05 1 0
.. ...
2018-08-22 1 1
2018-08-23 1 1
2018-08-24 1 1
2018-08-25 1 0
2018-08-26 1 0
[238 rows x 2 columns]
Upvotes: 2
Reputation: 342
First, pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')
will not create a DataFrame
instead it will create a DatetimeIndex
. You can then convert it into a DataFrame
by having it as an index or a separate column.
# Having it as an index
datetime_index = pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')
df = pd.DataFrame({}, index=datetime_index)
# Using numpy.where() to create the Vacation column
df['Vacation'] = np.where((df.index >= '2018-06-24') & (df.index <= '2018-08-24'), 1, 0)
Or
# Having it as a column
datetime_index = pd.date_range(start='2018-01-01', end='2019-12-31', freq='D')
df = pd.DataFrame({'Date': datetime_index})
# Using numpy.where() to create the Vacation column
df['Vacation'] = np.where((df['Date'] >= '2018-06-24') & (df['Date'] <= '2018-08-24'), 1, 0)
Note: Displaying only the first five rows of the dataframe df
.
Upvotes: 2