Olivia
Olivia

Reputation: 814

python date and datetime from multiple columns

I have some data that looks like:

        key  DATE - DAY  DATE - MONTH  DATE - YEAR  GMT HRS  GMT MINUTES  
1        2          29             2         2016        2            2   
2        3          29             2         2016        2            2   
3        4          29             2         2016        2            2   
4        5          29             2         2016        2            2   
5        6          29             2         2016        2            2   
6        7          29             2         2016        2            2   
7        8          29             2         2016        2            3   
8        9          29             2         2016        2            3   
9       10          29             2         2016        2            3   

   GMT SECONDS  
1           54                  
2           55                 
3           56                 
4           57                   
5           58              
6           59          
7            0      
8            1  
9            2 

At first the data was type float and the year was in format 16 so I did:

t['DATE - MONTH'] = t['DATE - MONTH'].astype(int)
t['DATE - YEAR'] = t['DATE - YEAR'].astype(int)
t['DATE - YEAR'] = t['DATE - YEAR']+2000
t['DATE - DAY'] = t['DATE - DAY'].astype(int)

^Note I was also confused why when using an index number rather than the column name you only work on what seems to be a temp table ie you can print the desired result but it didnt change the data frame.

Then I tried two methods:

t['Date'] = pd.to_datetime(dict(year=t['DATE - YEAR'], month = t['DATE - MONTH'], day = t['DATE - DAY']))

t['Date'] = pd.to_datetime((t['DATE - YEAR']*10000+t['DATE - MONTH']*100+t['DATE - DAY']).apply(str),format='%Y%m%d')

Both return:

ValueError: cannot assemble the datetimes: time data 20000000 does not match format '%Y%m%d' (match)

I'd like to create a date column (and then after use a similar logic for a datetime column with the additional 3 columns).

What is the problem?

EDIT: I had bad data and added errors='coerce' to handle those rows

Upvotes: 2

Views: 3553

Answers (1)

jezrael
jezrael

Reputation: 862581

First rename all columns, filter by values of dict and use to_datetime:

Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like ['year', 'month', 'day', 'minute', 'second', 'ms', 'us', 'ns']) or plurals of the same.

d = {'DATE - YEAR':'year','DATE - MONTH':'month','DATE - DAY':'day',
     'GMT HRS':'hour','GMT MINUTES':'minute','GMT SECONDS':'second'}

df['datetime'] = pd.to_datetime(df.rename(columns=d)[list(d.values())])

print (df)
   key  DATE - DAY  DATE - MONTH  DATE - YEAR  GMT HRS  GMT MINUTES  \
1    2          29             2         2016        2            2   
2    3          29             2         2016        2            2   
3    4          29             2         2016        2            2   
4    5          29             2         2016        2            2   
5    6          29             2         2016        2            2   
6    7          29             2         2016        2            2   
7    8          29             2         2016        2            3   
8    9          29             2         2016        2            3   
9   10          29             2         2016        2            3   

   GMT SECONDS            datetime  
1           54 2016-02-29 02:02:54  
2           55 2016-02-29 02:02:55  
3           56 2016-02-29 02:02:56  
4           57 2016-02-29 02:02:57  
5           58 2016-02-29 02:02:58  
6           59 2016-02-29 02:02:59  
7            0 2016-02-29 02:03:00  
8            1 2016-02-29 02:03:01  
9            2 2016-02-29 02:03:02  

Detail:

print (df.rename(columns=d)[list(d.values())])
   day  month  second  year  minute  hour
1   29      2      54  2016       2     2
2   29      2      55  2016       2     2
3   29      2      56  2016       2     2
4   29      2      57  2016       2     2
5   29      2      58  2016       2     2
6   29      2      59  2016       2     2
7   29      2       0  2016       3     2
8   29      2       1  2016       3     2
9   29      2       2  2016       3     2

Upvotes: 3

Related Questions