Reputation: 31
I have the following data format with different columns for year, month, day, and hour_minute (the first two digits are hour and the last two digits are minutes). How do I create a new column in datetime format by combining all of these existing columns?
YEAR | MONTH | DAY | HOUR_MINUTE |
---|---|---|---|
2015 | 1 | 15 | 0010 |
2015 | 1 | 2 | 0020 |
2015 | 1 | 15 | 0045 |
2015 | 1 | 15 | 2110 |
2015 | 10 | 21 | 2359 |
I have tried the following but have no luck. Thank you for your advise your advise.
df["new_column"]= pd.to_datetime(df[["YEAR", "MONTH", "DAY","HOUR_MINUTE"]])
Upvotes: 2
Views: 943
Reputation: 2263
Suggested script
import pandas as pd
df1 = pd.DataFrame({'YEAR': ['2015', '2015', '2015', '2015', '2015'],
'MONTH': ['1', '1', '1', '1', '10'],
'DAY': ['15', '2', '15', '15', '21'],
'HOUR_MINUTE': ['0010', '0020', '0045', '2110', '2359']
})
df1['FMT'] = df1.agg('-'.join(['{0[%s]}'%c for c in df1.columns]).format, axis=1)
df1['FMT'] = pd.to_datetime(df1['FMT'], format='%Y-%m-%d-%H%M')
print(df1)
Output
YEAR MONTH DAY HOUR_MINUTE FMT
0 2015 1 15 0010 2015-01-15 00:10:00
1 2015 1 2 0020 2015-01-02 00:20:00
2 2015 1 15 0045 2015-01-15 00:45:00
3 2015 1 15 2110 2015-01-15 21:10:00
4 2015 10 21 2359 2015-10-21 23:59:00
Upvotes: 0
Reputation: 4662
You can apply on entire df if you have only year,month and hour_minute columns like this
df.apply(lambda row: pd.to_datetime(''.join(row.values.astype(str)), format="%Y%m%d%H%M") ,axis=1)
Out[198]:
0 2015-11-05 00:10:00
1 2015-01-20 02:00:00
2 2015-11-05 04:05:00
3 2015-11-05 21:10:00
4 2015-10-21 23:59:00
dtype: datetime64[ns]
if there are other columns as well then just select the required columns then apply
df[['YEAR', 'MONTH', 'DAY', 'HOUR_MINUTE']].apply(lambda row: pd.to_datetime(''.join(row.values.astype(str)), format="%Y%m%d%H%M") ,axis=1)
Out[201]:
0 2015-11-05 00:10:00
1 2015-01-20 02:00:00
2 2015-11-05 04:05:00
3 2015-11-05 21:10:00
4 2015-10-21 23:59:00
dtype: datetime64[ns]
if you want new_column to be assigned to df then
df['new_column'] = df[['YEAR', 'MONTH', 'DAY', 'HOUR_MINUTE']].apply(lambda row: pd.to_datetime(''.join(row.values.astype(str)), format="%Y%m%d%H%M") ,axis=1)
df
Out[205]:
YEAR MONTH DAY HOUR_MINUTE new_column
0 2015 1 15 0010 2015-11-05 00:10:00
1 2015 1 2 0020 2015-01-20 02:00:00
2 2015 1 15 45 2015-11-05 04:05:00
3 2015 1 15 2110 2015-11-05 21:10:00
4 2015 10 21 2359 2015-10-21 23:59:00
Upvotes: 0
Reputation: 3706
You need to split HOUR_MINUTE column to HOUR and MINUTE
df["HOUR"] = df["HOUR_MINUTE"].str[0:2]
df["MINUTE"] = df.pop("HOUR_MINUTE").str[2:4]
df["new_column"] = pd.to_datetime(df[["YEAR", "MONTH", "DAY", "HOUR", "MINUTE"]], format="%Y-%m-%d %H:%M")
print(df)
Output:
YEAR MONTH DAY HOUR MINUTE new_column
0 2015 1 15 00 10 2015-01-15 00:10:00
1 2015 1 2 00 20 2015-01-02 00:20:00
2 2015 1 15 00 45 2015-01-15 00:45:00
3 2015 1 15 21 10 2015-01-15 21:10:00
4 2015 10 21 23 59 2015-10-21 23:59:00
Upvotes: 1