Reputation: 1406
I do have a 'input.csv' file with two columns i.e. 'start_date' and 'end_date'. I have to add column of the duration between the dates without using loops. I have tried the following code.
import pandas as pd
from datetime import datetime
data = pd.read_csv('data/input.csv')
# converting each data field into date format
data['start_date'] = map(lambda t:datetime.strptime(t,'%Y%M%d'), data.start_date.values)
data['end_date'] = map(lambda t:datetime.strptime(t,'%Y%M%d'), data.end_date.values)
# evaluating the duration
data['duration'] = data.end_date.values - data.start_date.values
It shows the following error
TypeError: unsupported operand type(s) for -: 'map' and 'map'
Initially the format of date is YYYYMMDD in string format like 20170516 20180931 etc. I tried a few alternative but didn't work.
Upvotes: 1
Views: 145
Reputation: 3560
Even simpler, you could do:
data["duration"] = data['end_date'] - data['start_date']
The type of the new "duration" column would be Timedelta. To get number of days:
data["duration"][0].days
returns number of days, type int
Upvotes: 0
Reputation: 1406
data['end_date'] = pd.to_datetime(data['end_date'], format='%Y%m%d')
data['start_date'] = pd.to_datetime(data['start_date'], format='%Y%m%d')
data["duration"] = data['end_date'] - data['start_date']
This worked. I used references from other comments.
Upvotes: 2
Reputation: 306
I was doing this yesterday! Cheers.
data['end_date'] = pd.to_datetime(data['end_date'])
data['start_date'] = pd.to_datetime(data['start_date'])
data["duration"] = data['end_date'] - data['start_date']
to_datetime is fairly robust and can deduce formats.
Upvotes: 1