Reputation: 145
I've got an imported csv file which has multiple columns with dates in the format "5 Jan 2001 10:20". (Note not zero-padded day)
if I do df.dtype
then it shows the columns as being a objects rather than a string or a datetime. I need to be able to subtract 2 column values to work out the difference so I'm trying to get them into a state where I can do that.
At the moment if I try the test subtraction at the end I get the error unsupported operand type(s) for -: 'str' and 'str'
.
I've tried multiple methods but have run into a problem every way I've tried. Any help would be appreciated. If I need to give any more information then I will.
Upvotes: 0
Views: 4939
Reputation: 14021
convert a column to datetime using this approach
df["Date"] = pd.to_datetime(df["Date"])
If column has empty values then change error level to coerce to ignore errors: Details
df["Date"] = pd.to_datetime(df["Date"], errors='coerce')
After which you should be able to subtract two dates.
example:
import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
consult this answer for more details:
Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes
If you want to directly load the column as datetime object while reading from csv, consider this example :
Pandas read csv dateint columns to datetime
Upvotes: 2
Reputation: 145
I found that the problem was to do with missing values within the column. Using coerce=True
so df["Date"] = pd.to_datetime(df["Date"], coerce=True)
solves the problem.
Upvotes: 1
Reputation: 2572
As suggested by @MaxU, you can use pd.to_datetime()
method to bring the values of the given column to the 'appropriate' format, like this:
df['datetime'] = pd.to_datetime(df.datetime)
You would have to do this on whatever columns you have that you need trasformed to the right dtype.
Alternatively, you can use parse_dates
argument of pd.read_csv()
method, like this:
df = pd.read_csv(path, parse_dates=[1,2,3])
where columns 1,2,3 are expected to contain data that can be interpreted as dates.
I hope this helps.
Upvotes: 3