uh_big_mike_boi
uh_big_mike_boi

Reputation: 3470

Python remove specific zeros from string

I have a csv which I am loading into a pandas dataframe.

I want to change a specific column to lose leading zeros from the month and date, and seperates the hour into another column.

For example, I have a column, called DateTime comes in like "03/17 04:00:00" or "01/01 15:00:00" and I would like to change the column (and also I guess to concatenate 2010 to the end of the date) to a "Date" Column with "3/17/2010" and a "Time" Column with "4:00:00" or like the second example, a "Date" column with "1/1/2010" and a "time" column with "15:00:00".

I'm pretty sure I would like to do this with regex_replace, I think that would be easiest?

Upvotes: 1

Views: 375

Answers (3)

DannyLeeAU
DannyLeeAU

Reputation: 162

The datetime module has what you need with strptime. You should save regex as a last resort, as it's less readable and more prone to error.

classmethod datetime.strptime(date_string, format)

Return a datetime corresponding to date_string, parsed according to format. This is equivalent to datetime(*(time.strptime(date_string, format)[0:6])). ValueError is raised if the date_string and format can’t be parsed by time.strptime() or if it returns a value which isn’t a time tuple. For a complete list of formatting directives, see strftime() and strptime() Behavior.

The documentation contains the directives, but here is an example you could implement in your code:

import datetime


def convert_time(tm):
    dt = datetime.datetime.strptime(tm, "%m/%d %H:%M:%S")
    dt.year = 2010

    parsed_date = "{month}/{day}/{year}".format(*dt)
    parsed_time = "{hour}:{minute}:{second}".format(*dt)

    return parsed_date, parsed_time

Upvotes: 1

jezrael
jezrael

Reputation: 863751

You can use str.split with str.lstrip:

df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
df.Date = df.Date.str.lstrip('0') + '/2010'
print (df)
         DateTime       Date      Time
0  03/17 04:00:00  3/17/2010  04:00:00
1  01/01 15:00:00  1/01/2010  15:00:00

Similar solution with str.replace by regex ^0 (first 0 in string):

df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
df.Date = df.Date.str.replace('^0', '') + '/2010'

Timings:

In [10]: %timeit (jez(df2))
1 loop, best of 3: 483 ms per loop

In [11]: %timeit (jez(df))
1 loop, best of 3: 500 ms per loop

In [12]: %timeit (user39(df1))
1 loop, best of 3: 740 ms per loop

Code for timings:

df = pd.DataFrame({'DateTime': ["03/17 04:00:00", "01/01 15:00:00"]})
df = pd.concat([df]*100000).reset_index(drop=True)
#[200000 rows x 3 columns]
print (df)
df1 = df.copy()
df2 = df.copy()

def jez(df):
    df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
    df.Date = df.Date.str.lstrip('0') + '/2010'
    return (df)

def jez1(df):
    df[['Date', 'Time']] = df.DateTime.str.split(expand=True)
    df.Date = df.Date.str.replace('^0', '') + '/2010'
    return (df)    

def user39(df):
    dates = []
    times = []

    for x in df.DateTime:
        d, t = x.split()
        d = re.sub("^0", "", d)
        d += "/2010"
        dates.append(d)
        times.append(t)

    df['Date'] = dates
    df['Time'] = times
    return df

print (jez(df))
print (jez(df2))
print (user39(df1))

Upvotes: 1

calico_
calico_

Reputation: 1221

Keep in mind that this assumes your data is all formatted the same:

df = pd.DataFrame({'DateTime': ["03/17 04:00:00", "01/01 15:00:00"]})

dates = []
times = []

for x in df.DateTime:
    d, t = x.split()
    d = re.sub("^0", "", d)
    d += "/2010"
    dates.append(d)
    times.append(t)

df['Date'] = dates
df['Time'] = times

Upvotes: 1

Related Questions