Reputation: 123
I'm struggling to figure out a way iterate through a dataframe, and use an if statement to determine whether a day is a weekday or weekend (and replicate, using the holidays package to find holidays)
| Date |col1 |col2 |col3 |----| week | <- Add
|------------|-----|-----|-----|----|------|
| 2019-10-27 | 1.5 | 2.4 | 1.9 |----| wknd |
| 2019-10-28 | 2.7 | 2.1 | 1.9 |----| wkdy |
| 2019-10-29 | 1.4 | 2.7 | 1.8 |----| wkdy |
My example df has a date column, and 3 columns of data. The actual dataset has around 5 years of data. I want to add some conditional that checks to see if that date is a weekend or weekday, and adds that as a new column. I have no issues iterating and checking the dates, but I can't find anything that allows me to append or write to the dataframe on a line by line basis.
I had something like this so far:
for i in df.date:
if i.weekday() < 5:
df['period2'] = ('wkdy')
else:
df['period2'] = ('wknd')
Obviously this doesn't meet my needs, as it just sets all columns in the DF to the last iterated value.
What is the best way to iterate through and apply this logic on a line by line basis? Am I missing something simple?
Upvotes: 0
Views: 61
Reputation: 457
you can use apply
. it return a series
or a DataFrame
,
df['week'] = df['date'].apply(lambda x: "wkdy" if x.weekday() < 5 else "wked")
there are some other function can do it, like df.map
, df.aggreagte
, because aggreagte
if basic of apply
.
Upvotes: 0
Reputation: 4284
You can use np.where
import numpy as np
df['period2'] = np.where(df['date'].dt.weekday < 5,'wkdy','wknd')
If you really want to use a for loop you should loop over the rows by doing
for i in range(df.shape[0]):
....
Upvotes: 2
Reputation: 188
I'd try to use the loc
function and apply the functions to entire columns at once so the operation is vectorized and much quicker to execute:
df['week'] = df['date'].weekday()
df['week'].loc[df['week'] < 5] = 'wkday'
df['week'].loc[df['week'] >= 5] = 'wknd'
Upvotes: 0