Reputation: 83
I have a dataframe with 13 columns, where two columns are "Date" and "Time".
I would like to 1) join "Date" and "Time" and 2) format the result correctly (DD.MM.YYYY HH:MM:SS)
Date Time
0 2012-09-27 00:40:20
1 2012-09-27 07:24:37
2 2012-09-27 07:25:08
3 2012-09-27 07:43:48
4 2012-09-27 07:44:27
5 2012-09-27 07:55:16
6 2012-09-27 08:04:19
7 2012-09-27 08:08:21
8 2012-09-27 08:09:08
9 2012-09-27 08:15:30
10 2012-09-27 08:15:35
So I try:
>type(df["Date"])
pandas.core.series.Series
And I think to myself "Why not convert them both in to string representations and manipulate them that way?"
>g = df["Time"].astype(str)
>type(g)
pandas.core.series.Series
"What?! It's still a Series?!"
So I try to be creative:
>f = df["Date"].astype(str) + " " + df["Time"].astype(str)
>f
0 2012-09-27 00:40:20
1 2012-09-27 07:24:37
2 2012-09-27 07:25:08
3 2012-09-27 07:43:48
"Hey! That works! No let's just get the format right (DD.MM.YYYY HH:MM:SS)"
>#dt.strptime(f, '%Y-%m-%d %H:%M:%S')
TypeError: strptime() argument 1 must be str, not Series
"Still a Series...hmmm....Obviously it won't let itself be converted. Let's try something else"
>f = p.concat([df["Date"], df["Time"]], axis=1)
>f
Date Time
0 2012-09-27 00:40:20
1 2012-09-27 07:24:37
2 2012-09-27 07:25:08
"Nice! But.... Isn't this TWO columns still...?"
>f.shape
(100,2)
"Right...back to square one...."
I'm out of ideas....
Anyone?
Upvotes: 2
Views: 9902
Reputation: 863651
I think you need join columns with to_datetime
:
print (type(df.loc[0, 'Date']))
<class 'str'>
print (type(df.loc[0, 'Time']))
<class 'str'>
df['datetime'] = pd.to_datetime(df['Date'] + " " + df['Time'])
If there is datetime with string
s:
print (type(df.loc[0, 'Date']))
<class 'pandas._libs.tslib.Timestamp'>
print (type(df.loc[0, 'Time']))
<class 'str'>
df['datetime'] = df['Date'] + pd.to_timedelta(df['Time'])
If there is datetime with time
s:
print (type(df.loc[0, 'Date']))
<class 'pandas._libs.tslib.Timestamp'>
print (type(df.loc[0, 'Time']))
<class 'datetime.time'>
df['datetime'] = df['Date'] + pd.to_timedelta(df['Time'].astype(str))
And if there are dates with times:
print (type(df.loc[0, 'Date']))
<class 'datetime.date'>
print (type(df.loc[0, 'Time']))
<class 'datetime.time'>
df['datetime'] = pd.to_datetime(df["Date"].astype(str) + " " + df["Time"].astype(str))
print (df)
Date Time datetime
0 2012-09-27 00:40:20 2012-09-27 00:40:20
1 2012-09-27 07:24:37 2012-09-27 07:24:37
2 2012-09-27 07:25:08 2012-09-27 07:25:08
3 2012-09-27 07:43:48 2012-09-27 07:43:48
4 2012-09-27 07:44:27 2012-09-27 07:44:27
5 2012-09-27 07:55:16 2012-09-27 07:55:16
6 2012-09-27 08:04:19 2012-09-27 08:04:19
7 2012-09-27 08:08:21 2012-09-27 08:08:21
8 2012-09-27 08:09:08 2012-09-27 08:09:08
9 2012-09-27 08:15:30 2012-09-27 08:15:30
10 2012-09-27 08:15:35 2012-09-27 08:15:35
Upvotes: 4
Reputation: 1351
the columns can be joined and assigned to a new or existing column in the dataframe:
df['datetime'] = df['dates'] + " " + df['time']
outputs example:
dates time datetime
0 2012-09-27 00:40:20 2012-09-27 00:40:20
1 2012-09-27 07:24:37 2012-09-27 07:24:37
2 2012-09-27 07:25:08 2012-09-27 07:25:08
Upvotes: 2
Reputation: 12156
.astype(str)
works on the elements of Series, not the Series itself, so of course type(df["Time"].astype(str)) == pd.Series
. This seems to be the source of much of your confusion, you're acting on the Series, not its elements.
A solution (there maybe an easier way) is to just loop over the series:
dts = [datetime.datetime.strptime(elem, '%Y-%m-%d%H:%M:%S')
for elem in df['Date'] + df['Time']]
fmted = [elem.strftime('%d-%m-%Y %H:%M:%S') for elem in dts]
df.insert(0, 'DateTime', fmted)
Upvotes: 1