Reputation: 1772
I want to add a np.datetime64
column to a pandas dataframe that has been read from a .csv file containing columns for year, month, day, hour and minute and use it as an index. I have combined the separate columns to make a column of datetime strings.
import numpy as np
import pandas as pd
filename = 'test.csv'
df = pd.read_csv(filename, header=0, usecols = [2,3,4,5,6], names = ['y','m','d','h','min'],dtype = {'y':'str','m':'str','d':'str','h':'str','min':'str'}) #read csv file into df
df['datetimetext'] = (df['y']+'-'+df['m']+'-'+df['d']+' '+df['h']+':'+df['min']+':00')
So the dataframe looks like this:
y m d h min datetimetext
0 1993 09 06 00 30 1993-09-06 00:30:00
1 1993 09 06 01 00 1993-09-06 01:00:00
2 1993 09 06 01 30 1993-09-06 01:30:00
3 1993 09 06 02 00 1993-09-06 02:00:00
4 1993 09 06 02 30 1993-09-06 02:30:00
......
Now I want to add a column with the datetime formatted as np.datetime64
I want to write
df['datetime'] = np.datetime64(df['datetimetext'])
but that creates an error
ValueError: Could not convert object to NumPy datetime
Do I need to iterate through each row of the dataframe, or is there a more elegant solution?
Upvotes: 4
Views: 9917
Reputation: 294218
easiest way with what you have is
df['datetime'] = pd.to_datetime(df['datetimetext'])
However, if your columns are named appropriately, you can convert directly from the named columns. I renamed your columns using rename
m = dict(y='year', m='month', d='day', h='h', min='m')
# rename columns and get rid of datetimetext
df = df[['y', 'm', 'd', 'h', 'min']].rename(columns=m)
df
I'll next perform the conversion and assign to the index in one shot
df.index = pd.to_datetime(df)
df
Upvotes: 6