Reputation: 143
I have this dataframe:
Start_date End_date hour1 hour2
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00
I`m trying to create a new column with duration (need the output as a numerical value in seconds) only based on my columns (hour2 and hour1)
I´ve used this code to create my hour columns. Maybe the error is in here.
date_df['hour1'] = date_df['Start_date'].dt.time
date_df['hour2'] = date_df['End_date'].dt.time
date_df
I tried this solution:
date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-283-b75adc651706> in <module>
----> 1 date_df['hour2'] = pd.to_datetime(date_df['hour2'])
2 date_df['hour1'] = pd.to_datetime(date_df['hour1'])
3 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
801 result = arg.map(cache_array)
802 else:
--> 803 values = convert_listlike(arg._values, format)
804 result = arg._constructor(values, index=arg.index, name=arg.name)
805 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
~\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
457 assert format is None or infer_datetime_format
458 utc = tz == "utc"
--> 459 result, tz_parsed = objects_to_datetime64ns(
460 arg,
461 dayfirst=dayfirst,
~\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
2042
2043 try:
-> 2044 result, tz_parsed = tslib.array_to_datetime(
2045 data,
2046 errors=errors,
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime_object()
pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime()
TypeError: <class 'datetime.time'> is not convertible to datetime
I also tried this solution:
date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
and i got this error message:
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
142 try:
--> 143 result = expressions.evaluate(op, left, right)
144 except TypeError:
~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in evaluate(op, a, b, use_numexpr)
232 if use_numexpr:
--> 233 return _evaluate(op, op_str, a, b) # type: ignore
234 return _evaluate_standard(op, op_str, a, b)
~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_numexpr(op, op_str, a, b)
118 if result is None:
--> 119 result = _evaluate_standard(op, op_str, a, b)
120
~\Anaconda3\lib\site-packages\pandas\core\computation\expressions.py in _evaluate_standard(op, op_str, a, b)
67 with np.errstate(all="ignore"):
---> 68 return op(a, b)
69
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-286-bf4c33189e88> in <module>
----> 1 date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
~\Anaconda3\lib\site-packages\pandas\core\ops\common.py in new_method(self, other)
63 other = item_from_zerodim(other)
64
---> 65 return method(self, other)
66
67 return new_method
~\Anaconda3\lib\site-packages\pandas\core\ops\__init__.py in wrapper(left, right)
341 lvalues = extract_array(left, extract_numpy=True)
342 rvalues = extract_array(right, extract_numpy=True)
--> 343 result = arithmetic_op(lvalues, rvalues, op)
344
345 return left._construct_result(result, name=res_name)
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in arithmetic_op(left, right, op)
188 else:
189 with np.errstate(all="ignore"):
--> 190 res_values = na_arithmetic_op(lvalues, rvalues, op)
191
192 return res_values
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in na_arithmetic_op(left, right, op, is_cmp)
148 # will handle complex numbers incorrectly, see GH#32047
149 raise
--> 150 result = masked_arith_op(left, right, op)
151
152 if is_cmp and (is_scalar(result) or result is NotImplemented):
~\Anaconda3\lib\site-packages\pandas\core\ops\array_ops.py in masked_arith_op(x, y, op)
90 if mask.any():
91 with np.errstate(all="ignore"):
---> 92 result[mask] = op(xrav[mask], yrav[mask])
93
94 else:
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
When I use the suggestion given below to initially load the data frame, the error actually ceases to exist. But the problem is that this same error is affecting my original data frame (the real exercise), so I need to understand what I'm doing wrong or what should I change to fix the problem.
How should i change the code?
Tks
Upvotes: 1
Views: 233
Reputation: 36
I run your code in my PC. And it didn't make error.
Your dataframe values were not str
.
Already its' type was datetime. Your error message said that information.
TypeError: <class 'datetime.time'> is not convertible to datetime
Run this first date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
And then should check type of values.
Below is your code that I run my PC.
date_df = pd.DataFrame(
{
"Start_date": ["2018-01-31 12:00:00", "2018-02-28 12:00:00"],
"End_date": ["2019-03-17 21:45:00", "2019-03-24 21:45:00"],
"hour1": ["12:00:00", "12:00:00"],
"hour2": ["21:45:00", "21:45:00"],
}
)
date_df['hour2'] = pd.to_datetime(date_df['hour2'])
date_df['hour1'] = pd.to_datetime(date_df['hour1'])
date_df['NewColumn2']=date_df['hour2']-date_df['hour1']
Okay, now I understand what you done. You have to first check your value's type. VERY IMPORTANT.
I think that your 'Start_date'
and 'End_date'
is already datetime.datetime
objects.
your_date_df['NewColumn2'] = your_date_df['End_date'] - your_date_df['Start_date']
If you want to show only time difference. Do this. First, import datetime
import datetime
your_date_df['NewColumn2_onlyTime'] = your_date_df['NewColumn2'].apply(
lambda x: (datetime.datetime.min + x).time())
print(your_date_df)
index Start_date End_date hour1 hour2 NewColumn2 NewColumn2_onlyTime
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00 410 days 09:45:00 09:45:00
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00 389 days 09:45:00 09:45:00
Upvotes: 1
Reputation: 5648
You don't need to get the hours if you want the difference from start to end. You can do this
data='''
Start_date End_date hour1 hour2
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
df['Start_date'] = pd.to_datetime(df['Start_date'])
df['End_date'] = pd.to_datetime(df['End_date'])
df['deltadays_seconds'] = (df.End_date-df.Start_date).dt.total_seconds()
df
Start_date End_date hour1 hour2 deltadays_seconds
0 2018-01-31 12:00:00 2019-03-17 21:45:00 12:00:00 21:45:00 35459100.0
1 2018-02-28 12:00:00 2019-03-24 21:45:00 12:00:00 21:45:00 33644700.0
You can sub in hour1 and hour2, but you get the same answer. hour1 and hour2 is just the representation of the total date and time.
Upvotes: 0