TaipanCodes
TaipanCodes

Reputation: 9

multiple columns to single datetime dataframe column

I have a data frame that contains (among others) columns for the time of day (00:00-23:59:59) day (1-7), month (1-12), and year (2000-2019). How can I combine the values of each of these columns on a row by row basis into a new DateTime object and then store these new date-times in a new column? I've read other posts pertaining to such a task but they all seem to involve one date column to one DateTime column whereas I have 4 columns that need to be transformed into DateTime. Any help is appreciated!

e.g.

| 4:30:59 | 1 | 1 | 2000 | TO 200/1/1 4:30:59

this is the only code I have so far which probably doesn't do anything

#creating datetime object (MISC)
data = pd.read_csv('road_accidents_data_clean.csv',delimiter=',')
df = pd.DataFrame(data)
format = '%Y-%m-%d %H:%M:%S'
n = 0
df['datetime'] = data.loc[n,'Crash_Day'],data.loc[n,'Crash_Month'],data.loc[n,'Year']

My DataFrame is layed out as follows:

Index | Age | Year | Crash_Month | Crash_Day | Crash_Time | Road_User | Gender | 
  0      37   2000         1            1        4:30:59     DRIVER      MALE
  1      42   2000         1            1        7:45:10     DRIVER      MALE
  2      25   2000         1            1        10:15:30   PEDESTRIAN  FEMALE

Crash_Type | Injury_Severity | Crash_LGA | Crash_Area_Type | Datetime |
  UNKNOWN          1              YARRA      MELBOURNE          NaN
 OVERTAKING        1              YARRA      MELBOURNE          NaN
ADJACENT DIR       0              MONASH     MELBOURNE          NaN

NOTE: the dataframe is 13 columns wide i just couldn't fit them all on one line so Crash_Type starts to the right of Gender.

below is the code i've been suggested to use/my adaptation of it

df = pd.DataFrame(dict(
    Crash_Time=['4:30:59','4:20:00'],
    Crash_Day=[1,20],
    Crash_Month=[1,4],
    Year=[2000,2020],

))


data['Datetime'] = df['Datetime']=pd.to_datetime(
    np.sum([
        df['Year'].astype(str),
        '-',
        df['Crash_Month'].astype(str),
        '-',
        df['Crash_Day'].astype(str),
        ' ',
        df['Crash_Time'],
    ]),
    format = '%Y-%m-%d %H:%M:%S',
)

I've adapted this code in order to combine the values for the datetime column with the my original dataframe.

Upvotes: 1

Views: 78

Answers (1)

Phillyclause89
Phillyclause89

Reputation: 680

Combine the columns into a single series of stings using + (converting to str where needed with pandas.Series.astype method) then pass that new series into pd.to_datetime before assigning it to a new column in your df:

import pandas as pd

df = pd.DataFrame(dict(time=['4:30:59'],date=[1],month=[1],year=[2000]))

df['datetime'] = pd.to_datetime(
    df['year'].astype(str)+'-'+df['month'].astype(str)+'-'+df['date'].astype(str)+' '+df['time'],
    format = '%Y-%m-%d %H:%M:%S',
)
print(df)

example in python tutor

edit: You can also use a numpy.sum to make that one long line adding columns together easier on the eyes:

import pandas as pd
import numpy as np

df = pd.DataFrame(dict(
    time=['4:30:59','4:20:00'],
    date=[1,20],
    month=[1,4],
    year=[2000,2020],

))

df['datetime']=pd.to_datetime(
    np.sum([
        df['year'].astype(str),
        '-',
        df['month'].astype(str),
        '-',
        df['date'].astype(str),
        ' ',
        df['time'],
    ]),
    format = '%Y-%m-%d %H:%M:%S',
)

sum example in python tutor

edit 2: Using your actual column names, it should be something like this:

import pandas as pd
import numpy as np
'''
Index | Age | Year | Crash_Month | Crash_Day | Crash_Time | Road_User | Gender | 
  0      37   2000         1            1        4:30:59     DRIVER      MALE

Crash_Type | Injury_Severity | Crash_LGA | Crash_Area_Type | Datetime |
  UNKNOWN          1              YARRA      MELBOURNE          NaN
'''
df = pd.DataFrame(dict(
    Crash_Time=['4:30:59','4:20:00'],
    Crash_Day=[1,20],
    Crash_Month=[1,4],
    Year=[2000,2020],

))

df['Datetime']=pd.to_datetime(
    np.sum([
        df['Year'].astype(str),
        '-',
        df['Crash_Month'].astype(str),
        '-',
        df['Crash_Day'].astype(str),
        ' ',
        df['Crash_Time'],
    ]),
    format = '%Y-%m-%d %H:%M:%S',
)
print(df)

another python tutor link

One thing to note is that you might want to double check if your csv file is separated by just a comma or could it be a comma and a space? possible that you may need to load the data with df = pd.read_csv('road_accidents_data_clean.csv',sep=', ') if there is an extra space separating the data in addition to the comma. You don't want to have that extra space in your data.

Upvotes: 1

Related Questions