Reputation: 323
I have a pandas dataframe and want to turn all the dates into the last date of the previous month. For example "2020-02-04" should turn into "2020-01-31", "2020-03-03" should turn into "2020-02-28" and so on. My df looks like this (in the month column I already have the right month for my wanted date) :
In[76]: dfall[["date", "month"]]
Out[76]:
date month
0 2020-02-04 1
1 2020-03-03 2
2 2020-04-02 3
3 2020-05-05 4
4 2020-06-03 5
5 2020-07-02 6
Now I tried this:
import calendar
import datetime
today = datetime.now()
dfall.date = str(today.year) + "-" + str(dfall.month) + "-" + str(calendar.monthrange(today.year,dfall.month)[1])
The idea was to build the new date by adding the strings together. But this code raises an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I know the error is coming from this part: str(calendar.monthrange(today.year,dfall.month)[1])
(without this part the codes runs without error but the result is not what I want). It's probably because python doesnt know which month to take from dfall.month
. Does anybody know how I could handle that problem?
Upvotes: 2
Views: 2432
Reputation: 403
Alternative approach:
from datetime import datetime, timeldelta
def convert_date(date_str):
date = datetime.strptime(date_str, '%Y-%m-%d')
return (date - timedelta(days=date.day)).strftime('%Y-%m-%d')
dfall.date.apply(convert_date)
Upvotes: 1
Reputation: 367
from datetime import datetime
dates = [datetime(2020, 2, 4), datetime(2020, 3, 3), datetime(2020, 4, 2), datetime(2020, 5, 5), datetime(2020, 6, 3), datetime(2020, 7, 2)]
month = [1, 2, 3, 4, 5, 6]
ts = pd.Series(month, index=dates)
date_col = ts.shift(-1, freq='M').index
pd.DataFrame({'Dates': date_col, 'Month': month})
Upvotes: 1
Reputation: 6483
As an alternative, you could try this instead:
dfall.date=dfall.date.apply(lambda x: x.replace(day=1)- pd.Timedelta(days=1))
If the dfall.date
is type string, try this instead:
dfall.date=pd.to_datetime(dfall.date).apply(lambda x: x.replace(day=1)- pd.Timedelta(days=1))
You could try this another vectorized alternative, made by Kyle Barron, to avoid the usage of df.apply(lambda x: x.replace(day=1))
and speeds up to 8.5x the performance:
def vec_dt_replace(series, year=None, month=None, day=None):
return pd.to_datetime(
{'year': series.dt.year if year is None else year,
'month': series.dt.month if month is None else month,
'day': series.dt.day if day is None else day})
#dfall.date=pd.to_datetime(dfall.date) #(if dfall.date is type string)
dfall.date=vec_dt_replace(dfall.date,day=1)- pd.Timedelta(days=1)
If you want to keep your original solution, then:
str(dfall.month)
to dfall.month.astype(str)
str(calendar.monthrange(today.year,dfall.month)[1])
to dfall.month.apply(lambda x:calendar.monthrange(today.year,x)[1]).astype(str)
pd.to_datetime(dfall.date)
import calendar
import datetime
today = datetime.datetime.now()
dfall.date = str(today.year) + "-" + dfall.month.astype(str) + "-" + dfall.month.apply(lambda x:calendar.monthrange(today.year,x)[1]).astype(str)
dfall.date = pd.to_datetime(dfall.date)
print(dfall)
Output of all solutions:
dfall[["date", "month"]]
date month
0 2020-01-31 1
1 2020-02-29 2
2 2020-03-31 3
3 2020-04-30 4
4 2020-05-31 5
5 2020-06-30 6
Upvotes: 1
Reputation: 25684
assuming 'date'
column is of type string (use .astype(str)
or strftime
otherwise), you can cast the year-month part to datetime
and subtract a timedelta
of one day:
dfall['lastdaylastmonth'] = pd.to_datetime(dfall['date'].str[:-3]) - pd.Timedelta(days=1)
# dfall['lastdaylastmonth']
# 0 2020-01-31
# 1 2020-02-29
# 2 2020-03-31
# 3 2020-04-30
# 4 2020-05-31
# 5 2020-06-30
# Name: lastdaylastmonth, dtype: datetime64[ns]
Upvotes: 1
Reputation: 43
Another approach:
import datetime
for index, d in df.iterrows():
temp = d["date"]
dtObj = datetime.datetime.strptime(temp, "%Y-%m-%d")
newDt = dtObj - datetime.timedelta(days=dtObj.day)
df["date"][index] = datetime.datetime.strftime(newDt, "%Y-%m-%d")
Upvotes: 1
Reputation: 464
import datetime
from datetime import timedelta
df = pd.DataFrame({"date":['2020-02-04','2020-03-03','2020-04-02','2020-05-05','2020-06-03','2020-07-02'],
"month": [1,2,3,4,5,6]})
# Conert to data
def change_time_format(series):
return datetime.datetime.strptime(series,"%Y-%m-%d")
df.date = df.date.apply(change_time_format)
dates = list(df.date)
previous_m_last_date = []
for d in dates:
days = d.day
u_date = d - timedelta(days)
previous_m_last_date.append(u_date)
df["updated_date"] = previous_m_last_date
df
Upvotes: 1