Reputation: 33
I am trying to find the number of days between a list of dates in a Pandas Dataframe and the current date.
I want to create a new column with the number of days between the due dates and the current date. For example:
+---------------------+------------+
| Due Date | Difference |
+---------------------+------------+
| 2019-04-15 00:00:00 | 146 |
| 2019-02-11 00:00:00 | 83 |
| 2019-03-11 00:00:00 | 111 |
| 2019-01-04 00:00:00 | 45 |
| 2019-05-13 00:00:00 | 174 |
+---------------------+------------+
I was trying to do:
current = np.datetime64('today')
df['Difference'] = df['Due Date'] - current
But I was getting wrong numbers. It works fine if I do them individually like:
df['Due Date'][0] - current
Any help would be great. Thanks!
Upvotes: 2
Views: 7715
Reputation: 109
I would check the format of df["Due Date"]
. If it's not in the same date time format as todays date, then switch it. Broadcasting (the simple subtraction)should work if the formats match.
Otherwise, try using a lambda function to apply your changes:
df['Difference'] = df['Due Date'].apply(lambda x: x - current, axis=1)
Upvotes: 0
Reputation: 164803
But I was getting wrong numbers.
Works fine for me on Pandas 0.23 / NumPy 1.14.3, assuming Due Date
is a datetime
series:
print(df['Due Date'] - np.datetime64('today'))
0 146 days
1 83 days
2 111 days
3 45 days
4 174 days
Name: Due Date, dtype: timedelta64[ns]
More idiomatic would be to use Pandas-generated objects and dt.days
if you want integers:
print((df['Due Date'] - pd.Timestamp('today')).dt.days)
0 145
1 82
2 110
3 44
4 173
Name: Due Date, dtype: int64
Note, for example, the one-day differential caused by working with the NumPy version. The real answer is between the two, but rounding up is probably not what one expects.
Upvotes: 2
Reputation: 2231
I think you need to convert them to datetime to perform date-like operations.
df['Due Date'] = pd.to_datetime(df['Due Date'])
So the complete code would look like this:
df['Due Date'] = pd.to_datetime(df['Due Date'])
current = np.datetime64('today')
df['Difference'] = df['Due Date'] - current
EDIT: Also another possible issue: i think you would need to add the current date as a column (or a Panda Series) so a good fix will be:
current = np.datetime64('today')
df['current'] = np.datetime64('today')
df['Difference'] = df['Due Date'] - df['current']
Upvotes: 2