Mark K
Mark K

Reputation: 9376

Pandas to calculate date differences in dataframe rows

Two columns in a csv file as below. I want to check the date intervals of each,

i.e.

'2013-11-01' - '2013-10-08',

'2013-12-02' - '2013-11-01' etc.

enter image description here

After,

df = pd.read_csv(f, sep='\t')
df_date = df["Date"]

I tried:

print (df["Date"].shift(-1) - df["Date"]).astype('timedelta64[d]')

and

print df['Date'].shift() - df['Date']

both of them returned:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

What went wrong, and how can I correct it? Thank you.

Upvotes: 3

Views: 50

Answers (1)

jezrael
jezrael

Reputation: 863701

Problem is column Date is filled string repr of datetimes, so first is necessary converting - e.g. by parse_dates parameter or to_datetime, then call Series.diff:

df = pd.read_csv(f, sep='\t', parse_dates=['Date'])

print (df["Date"].diff(-1))

Another solution:

df = pd.read_csv(f, sep='\t')
df["Date"] = pd.to_datetime(df["Date"])
print (df["Date"].diff(-1))

Upvotes: 2

Related Questions