Reputation: 730
I'm trying to understand various functions in Python as I come from an R background.
The question I face is: How do I add and subtract days/years/months from pandas based on a condition? In R, I can use the dplyr
package where mutate
and ifelse
will allow me to achieve it together with the lubridate
package.
Here is my reproducible data in R:
df = data.frame(date1=c("2017-07-07", "2017-02-11", "2017-05-22", "2017-04-27"))
library(lubridate)
df$date1 <- ymd(df$date1) + years(2)
df$day <- wday(df$date1, label=TRUE)
Input
date1 day
1 2019-07-07 Sun
2 2019-02-11 Mon
3 2019-05-22 Wed
4 2019-04-27 Sat
Task: Add a year to the date if the day is "Sun" and subtract a year from the date if day is "Sat", else IGNORE
R Code
library(dplyr)
df %>% mutate(newdate = ifelse(df$day == "Sun", date1 %m+% years(1),
ifelse(df$day == "Sat", date1 %m-% years(1), date1))) -> df
df$newdate <- as.Date(df$newdate, origin = "1970-01-01")
df$newday <- wday(df$newdate, label=T)
df
Output
date1 day newdate newday
1 2019-07-07 Sun 2020-07-07 Tue
2 2019-02-11 Mon 2019-02-11 Mon
3 2019-05-22 Wed 2019-05-22 Wed
4 2019-04-27 Sat 2018-04-27 Fri
Could someone share with me how to achieve this output using Pandas?
Upvotes: 1
Views: 368
Reputation: 146
This should work fine for you:
df = pd.DataFrame(data = {'date1':["2017-07-07", "2017-02-11", "2017-05-22", "2017-04-27"], 'day':["Sun", "Mon", "Wed", "Sat"]})
df['date1']= pd.to_datetime(df['date1'])
df['date1'] = df['date1'] + pd.DateOffset(years=2)
def func_year(row):
if row['day'] == 'Sun':
date = row['date1'] + pd.DateOffset(years=1)
elif row['day'] == 'Sat':
date = row['date1'] - pd.DateOffset(years=1)
else:
date = row['date1']
return date
df['new_date'] = df.apply(func_year, axis=1)
Upvotes: 1
Reputation: 862406
Use DateOffset
for add years with Series.dt.strftime
and %a
fo names of days:
df = pd.DataFrame({'date1':pd.to_datetime(["2017-07-07",
"2017-02-11",
"2017-05-22",
"2017-04-27"])})
df['date1'] += pd.offsets.DateOffset(years=2)
df['day'] = df['date1'].dt.strftime('%a')
For set values by multiple boolean masks use numpy.select
:
masks = [df['day'] == 'Sun',
df['day'] == 'Sat']
vals = [df['date1'] + pd.offsets.DateOffset(years=1),
df['date1'] - pd.offsets.DateOffset(years=1)]
df['newdate'] = np.select(masks, vals, default=df['date1'])
df['newday'] = df['newdate'].dt.strftime('%a')
print (df)
date1 day newdate newday
0 2019-07-07 Sun 2020-07-07 Tue
1 2019-02-11 Mon 2019-02-11 Mon
2 2019-05-22 Wed 2019-05-22 Wed
3 2019-04-27 Sat 2018-04-27 Fri
Upvotes: 2