Slacke
Slacke

Reputation: 33

Python Difference Between Dates np array - current date

I am trying to find the number of days between a list of dates in a Pandas Dataframe and the current date.

I want to create a new column with the number of days between the due dates and the current date. For example:

+---------------------+------------+
|      Due Date       | Difference |
+---------------------+------------+
| 2019-04-15 00:00:00 |        146 |
| 2019-02-11 00:00:00 |         83 |
| 2019-03-11 00:00:00 |        111 |
| 2019-01-04 00:00:00 |         45 |
| 2019-05-13 00:00:00 |        174 |
+---------------------+------------+

I was trying to do:

current = np.datetime64('today')
df['Difference'] = df['Due Date'] - current

But I was getting wrong numbers. It works fine if I do them individually like:

df['Due Date'][0] - current

Any help would be great. Thanks!

Upvotes: 2

Views: 7715

Answers (3)

ash_huddles
ash_huddles

Reputation: 109

I would check the format of df["Due Date"]. If it's not in the same date time format as todays date, then switch it. Broadcasting (the simple subtraction)should work if the formats match.

Otherwise, try using a lambda function to apply your changes:

df['Difference'] = df['Due Date'].apply(lambda x: x - current, axis=1)

Upvotes: 0

jpp
jpp

Reputation: 164803

But I was getting wrong numbers.

Works fine for me on Pandas 0.23 / NumPy 1.14.3, assuming Due Date is a datetime series:

print(df['Due Date'] - np.datetime64('today'))

0   146 days
1    83 days
2   111 days
3    45 days
4   174 days
Name: Due Date, dtype: timedelta64[ns]

More idiomatic would be to use Pandas-generated objects and dt.days if you want integers:

print((df['Due Date'] - pd.Timestamp('today')).dt.days)

0    145
1     82
2    110
3     44
4    173
Name: Due Date, dtype: int64

Note, for example, the one-day differential caused by working with the NumPy version. The real answer is between the two, but rounding up is probably not what one expects.

Upvotes: 2

Manrique
Manrique

Reputation: 2231

I think you need to convert them to datetime to perform date-like operations.

df['Due Date'] = pd.to_datetime(df['Due Date'])

So the complete code would look like this:

df['Due Date'] = pd.to_datetime(df['Due Date'])
current = np.datetime64('today')
df['Difference'] = df['Due Date'] - current

EDIT: Also another possible issue: i think you would need to add the current date as a column (or a Panda Series) so a good fix will be:

current = np.datetime64('today')
df['current'] = np.datetime64('today')
df['Difference'] = df['Due Date'] - df['current']

Upvotes: 2

Related Questions