RitaM
RitaM

Reputation: 143

Pandas: find the difference between two time columns (Duration)

I have this dataframe:

          Start_date         End_date           hour1     hour2      
0   2018-01-31 12:00:00 2019-03-17 21:45:00  12:00:00   21:45:00
1   2018-02-28 12:00:00 2019-03-24 21:45:00  12:00:00   21:45:00

I`m trying to create a new column with duration (need the output as a numerical value in seconds) only based on my columns (hour2 and hour1)

I´ve used this code to create my hour columns. Maybe the error is in here.

date_df['hour1'] = date_df['Start_date'].dt.time
date_df['hour2'] = date_df['End_date'].dt.time
date_df

I tried this solution:

    date_df['hour2'] = pd.to_datetime(date_df['hour2'])
    date_df['hour1'] = pd.to_datetime(date_df['hour1'])
    date_df['NewColumn2']=date_df['hour2']-date_df['hour1'] 

Error:

        ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-283-b75adc651706> in <module>
    ----> 1 date_df['hour2'] = pd.to_datetime(date_df['hour2'])
          2 date_df['hour1'] = pd.to_datetime(date_df['hour1'])
          3 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
    
    ~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
        801             result = arg.map(cache_array)
        802         else:
    --> 803             values = convert_listlike(arg._values, format)
        804             result = arg._constructor(values, index=arg.index, name=arg.name)
        805     elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
    
    ~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
        457         assert format is None or infer_datetime_format
        458         utc = tz == "utc"
    --> 459         result, tz_parsed = objects_to_datetime64ns(
        460             arg,
        461             dayfirst=dayfirst,
    
    ~\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
       2042 
       2043     try:
    -> 2044         result, tz_parsed = tslib.array_to_datetime(
       2045             data,
       2046             errors=errors,
    
    pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
    
    pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
    
    pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime_object()
    
    pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
    
    TypeError: <class 'datetime.time'> is not convertible to datetime

I also tried this solution:

    date_df['NewColumn2']=date_df['hour2']-date_df['hour1']

and i got this error message:

    TypeError                                 Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    142     try:
--> 143         result = expressions.evaluate(op, left, right)
    144     except TypeError:

~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in evaluate(op, a, b, use_numexpr)
    232         if use_numexpr:
--> 233             return _evaluate(op, op_str, a, b)  # type: ignore
    234     return _evaluate_standard(op, op_str, a, b)

~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_numexpr(op, op_str, a, b)
    118     if result is None:
--> 119         result = _evaluate_standard(op, op_str, a, b)
    120 

~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_standard(op, op_str, a, b)
     67     with np.errstate(all="ignore"):
---> 68         return op(a, b)
     69 

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-286-bf4c33189e88> in <module>
----> 1 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']

~\Anaconda3\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~\Anaconda3\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right)
    341         lvalues = extract_array(left, extract_numpy=True)
    342         rvalues = extract_array(right, extract_numpy=True)
--> 343         result = arithmetic_op(lvalues, rvalues, op)
    344 
    345         return left._construct_result(result, name=res_name)

~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
    188     else:
    189         with np.errstate(all="ignore"):
--> 190             res_values = na_arithmetic_op(lvalues, rvalues, op)
    191 
    192     return res_values

~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
    148             #  will handle complex numbers incorrectly, see GH#32047
    149             raise
--> 150         result = masked_arith_op(left, right, op)
    151 
    152     if is_cmp and (is_scalar(result) or result is NotImplemented):

~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in masked_arith_op(x, y, op)
     90         if mask.any():
     91             with np.errstate(all="ignore"):
---> 92                 result[mask] = op(xrav[mask], yrav[mask])
     93 
     94     else:

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

When I use the suggestion given below to initially load the data frame, the error actually ceases to exist. But the problem is that this same error is affecting my original data frame (the real exercise), so I need to understand what I'm doing wrong or what should I change to fix the problem.

How should i change the code?

Tks

Upvotes: 1

Views: 233

Answers (2)

BM_Kim
BM_Kim

Reputation: 36

I run your code in my PC. And it didn't make error. Your dataframe values were not str.

Already its' type was datetime. Your error message said that information.

TypeError: <class 'datetime.time'> is not convertible to datetime

Run this first date_df['NewColumn2']=date_df['hour2']-date_df['hour1']

And then should check type of values.

Below is your code that I run my PC.

date_df = pd.DataFrame(
    {
        "Start_date": ["2018-01-31 12:00:00", "2018-02-28 12:00:00"],
        "End_date": ["2019-03-17 21:45:00", "2019-03-24 21:45:00"],
        "hour1": ["12:00:00", "12:00:00"],
        "hour2": ["21:45:00", "21:45:00"],
    }
)
date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1'] 

Okay, now I understand what you done. You have to first check your value's type. VERY IMPORTANT.

I think that your 'Start_date' and 'End_date' is already datetime.datetime objects.

your_date_df['NewColumn2'] = your_date_df['End_date'] - your_date_df['Start_date']

If you want to show only time difference. Do this. First, import datetime

import datetime
        
your_date_df['NewColumn2_onlyTime'] = your_date_df['NewColumn2'].apply(
    lambda x: (datetime.datetime.min + x).time())

print(your_date_df)

index   Start_date  End_date    hour1   hour2   NewColumn2  NewColumn2_onlyTime
0   2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00    21:45:00    410 days 09:45:00   09:45:00
1   2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00    21:45:00    389 days 09:45:00   09:45:00

Upvotes: 1

Jonathan Leon
Jonathan Leon

Reputation: 5648

You don't need to get the hours if you want the difference from start to end. You can do this

data='''
          Start_date         End_date           hour1     hour2
0   2018-01-31 12:00:00  2019-03-17 21:45:00  12:00:00   21:45:00
1   2018-02-28 12:00:00  2019-03-24 21:45:00  12:00:00   21:45:00'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
df['Start_date'] = pd.to_datetime(df['Start_date'])
df['End_date'] = pd.to_datetime(df['End_date'])
df['deltadays_seconds'] = (df.End_date-df.Start_date).dt.total_seconds()
df

           Start_date            End_date     hour1     hour2  deltadays_seconds
0 2018-01-31 12:00:00 2019-03-17 21:45:00  12:00:00  21:45:00         35459100.0
1 2018-02-28 12:00:00 2019-03-24 21:45:00  12:00:00  21:45:00         33644700.0

You can sub in hour1 and hour2, but you get the same answer. hour1 and hour2 is just the representation of the total date and time.

Upvotes: 0

Related Questions