Akash Kumar
Akash Kumar

Reputation: 1406

Finding duration between two dates in python using pandas

I do have a 'input.csv' file with two columns i.e. 'start_date' and 'end_date'. I have to add column of the duration between the dates without using loops. I have tried the following code.

import pandas as pd
from datetime import datetime

data = pd.read_csv('data/input.csv')

# converting each data field into date format
data['start_date'] = map(lambda t:datetime.strptime(t,'%Y%M%d'), data.start_date.values)
data['end_date'] = map(lambda t:datetime.strptime(t,'%Y%M%d'), data.end_date.values)

# evaluating the duration
data['duration'] = data.end_date.values - data.start_date.values

It shows the following error

TypeError: unsupported operand type(s) for -: 'map' and 'map'

Initially the format of date is YYYYMMDD in string format like 20170516 20180931 etc. I tried a few alternative but didn't work.

Upvotes: 1

Views: 145

Answers (3)

Ron Kalian
Ron Kalian

Reputation: 3560

Even simpler, you could do: data["duration"] = data['end_date'] - data['start_date'] The type of the new "duration" column would be Timedelta. To get number of days: data["duration"][0].days returns number of days, type int

Upvotes: 0

Akash Kumar
Akash Kumar

Reputation: 1406

data['end_date'] = pd.to_datetime(data['end_date'], format='%Y%m%d')
data['start_date'] = pd.to_datetime(data['start_date'], format='%Y%m%d')
data["duration"] = data['end_date'] - data['start_date']

This worked. I used references from other comments.

Upvotes: 2

HSchmachty
HSchmachty

Reputation: 306

I was doing this yesterday! Cheers.

data['end_date'] = pd.to_datetime(data['end_date'])
data['start_date'] = pd.to_datetime(data['start_date'])
data["duration"] = data['end_date'] - data['start_date']

to_datetime is fairly robust and can deduce formats.

Upvotes: 1

Related Questions