Reputation: 141
I have two panda series, closedDate
and createdDate
, thats elements are pandas timestamps, class 'pandas._libs.tslib.Timestampclass 'pandas._libs.tslib.Timestamp
.
I have subtracted those two panda series to make a list, age
, of pandas timedelta.
closedDate = data.iloc[:,1]
createdDate = data.iloc[:,2]
age = [x-y for x,y in zip(closedDate, createdDate)]
Now, I want to get the average of age
but with my line of code I get an error.
In: average_age = functools.reduce(lambda x, y: x + y, age) / len(age)
Out: OverflowError: int too big to convert
How can I fix this??
Thanks!
Upvotes: 1
Views: 338
Reputation: 11
You should do this with series like jpp suggested, but barring that, heres how you could do it with the list you provided.
average_age = sum(age, timedelta()) / len(age)
Upvotes: 0
Reputation: 164623
You can and should aim to use vectorised functions for this task.
In this example, you can subtract one pd.Series
from another. You can then use the mean
method to calculate the average.
data = pd.DataFrame({'createdDate': [pd.Timestamp('2018-01-01'),
pd.Timestamp('2018-05-01')],
'closedDate': [pd.Timestamp('2018-02-01'),
pd.Timestamp('2018-07-01')]})
closedDate = data['closedDate']
createdDate = data['createdDate']
ages = closedDate - createdDate
print(ages)
# 0 31 days
# 1 61 days
# dtype: timedelta64[ns]
res = ages.mean()
print(res)
# 46 days 00:00:00
There are 2 main reasons why using vectorised functions are better in this case:
pd.Timestamp
are numeric arrays (source).zip
on arrays underperforms versus zip
on lists (source).Upvotes: 2