Reputation: 1967
I have a simple question - and it seems to be one asked many times before (see e.g here, here, and here). Nonetheless, I can't solve it.
I got a pandas dataframe I read from a csv file. It contains a column with the name start-plan
with strings in the format '05-04-2017'
(April 5th, 2017). As far as I understand, it is a European datetime in the form %d-%m-%Y
.
Here is what I do:
df = pd.read_csv('activities.csv')
This is what dataframe head looks like:
print(df.head())
start-plan start-actual end-plan end-actual user late
0 12-01-2017 16-01-2017 11-02-2017 10-02-2017 1 0
1 11-05-2017 15-05-2017 10-06-2017 18-06-2017 2 1
2 20-08-2017 20-08-2017 19-09-2017 05-10-2017 3 1
3 10-12-2017 10-12-2017 09-01-2018 08-01-2018 1 0
4 25-04-2017 25-04-2017 25-05-2017 26-05-2017 4 0
I try to convert the colums like this:
pd.to_datetime(pd.Series('start-plan'), format='%d-%m-%y')
I get an error stating that time data 'start-plan' does not match format '%d-%M-%Y' (match)
What am I doing wrong? Moreover, I have several columns in the same format that I would like to convert. Is there a possibility to convert them all at once?
Upvotes: 0
Views: 584
Reputation: 294258
You are making a pd.Series
out of 'start-plan'
Try:
pd.to_datetime(df['start-plan'], format='%d-%m-%y')
You can also use the option dayfirst=True
.
You can get it in one go like this
cols = ['start-plan', 'start-actual', 'end-plan', 'end-actual']
df = df[cols].apply(
pd.to_datetime, dayfirst=True
).join(df.drop(cols, 1))
print(df)
start-plan start-actual end-plan end-actual user late
0 2017-01-12 2017-01-16 2017-02-11 2017-02-10 1 0
1 2017-05-11 2017-05-15 2017-06-10 2017-06-18 2 1
2 2017-08-20 2017-08-20 2017-09-19 2017-10-05 3 1
3 2017-12-10 2017-12-10 2018-01-09 2018-01-08 1 0
4 2017-04-25 2017-04-25 2017-05-25 2017-05-26 4 0
Upvotes: 4