Reputation: 3470
I have a csv which I am loading into a pandas dataframe.
I want to change a specific column to lose leading zeros from the month and date, and seperates the hour into another column.
For example, I have a column, called DateTime comes in like "03/17 04:00:00" or "01/01 15:00:00" and I would like to change the column (and also I guess to concatenate 2010 to the end of the date) to a "Date" Column with "3/17/2010" and a "Time" Column with "4:00:00" or like the second example, a "Date" column with "1/1/2010" and a "time" column with "15:00:00".
I'm pretty sure I would like to do this with regex_replace, I think that would be easiest?
Upvotes: 1
Views: 375
Reputation: 162
The datetime
module has what you need with strptime
. You should save regex as a last resort, as it's less readable and more prone to error.
classmethod datetime.strptime(date_string, format)
Return a
datetime
corresponding to date_string, parsed according to format. This is equivalent todatetime(*(time.strptime(date_string, format)[0:6]))
.ValueError
is raised if the date_string and format can’t be parsed bytime.strptime()
or if it returns a value which isn’t a time tuple. For a complete list of formatting directives, see strftime() and strptime() Behavior.
The documentation contains the directives, but here is an example you could implement in your code:
import datetime
def convert_time(tm):
dt = datetime.datetime.strptime(tm, "%m/%d %H:%M:%S")
dt.year = 2010
parsed_date = "{month}/{day}/{year}".format(*dt)
parsed_time = "{hour}:{minute}:{second}".format(*dt)
return parsed_date, parsed_time
Upvotes: 1
Reputation: 863751
You can use str.split
with str.lstrip
:
df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
df.Date = df.Date.str.lstrip('0') + '/2010'
print (df)
DateTime Date Time
0 03/17 04:00:00 3/17/2010 04:00:00
1 01/01 15:00:00 1/01/2010 15:00:00
Similar solution with str.replace
by regex ^0
(first 0
in string):
df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
df.Date = df.Date.str.replace('^0', '') + '/2010'
Timings:
In [10]: %timeit (jez(df2))
1 loop, best of 3: 483 ms per loop
In [11]: %timeit (jez(df))
1 loop, best of 3: 500 ms per loop
In [12]: %timeit (user39(df1))
1 loop, best of 3: 740 ms per loop
Code for timings:
df = pd.DataFrame({'DateTime': ["03/17 04:00:00", "01/01 15:00:00"]})
df = pd.concat([df]*100000).reset_index(drop=True)
#[200000 rows x 3 columns]
print (df)
df1 = df.copy()
df2 = df.copy()
def jez(df):
df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
df.Date = df.Date.str.lstrip('0') + '/2010'
return (df)
def jez1(df):
df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
df.Date = df.Date.str.replace('^0', '') + '/2010'
return (df)
def user39(df):
dates = []
times = []
for x in df.DateTime:
d, t = x.split()
d = re.sub("^0", "", d)
d += "/2010"
dates.append(d)
times.append(t)
df['Date'] = dates
df['Time'] = times
return df
print (jez(df))
print (jez(df2))
print (user39(df1))
Upvotes: 1
Reputation: 1221
Keep in mind that this assumes your data is all formatted the same:
df = pd.DataFrame({'DateTime': ["03/17 04:00:00", "01/01 15:00:00"]})
dates = []
times = []
for x in df.DateTime:
d, t = x.split()
d = re.sub("^0", "", d)
d += "/2010"
dates.append(d)
times.append(t)
df['Date'] = dates
df['Time'] = times
Upvotes: 1