charlie_boy
charlie_boy

Reputation: 101

Converting timestamp column in custom string format to seconds in python

I have a pandas dataframe with multiple columns containing timestamp values in string format. The values are like '0D_0H_1M_35S_0MS'. I would like to convert all these column values into milliseconds

enter image description here

Upvotes: 1

Views: 152

Answers (1)

jezrael
jezrael

Reputation: 862481

First replace string to format for possible convert to timedeltas by Series.replace with to_timedelta, convert to seconds by Series.dt.total_seconds and multiple by 1000 for miliseconds per columns in custom function in DataFrame.apply:

cols = ['date','date1']

def parse(x):
    s =  x.replace({'D_':' Days ','H_':':','M_':':','S_':'.','MS':''}, regex=True)
    return pd.to_timedelta(s).dt.total_seconds().mul(1000).astype(int)
    
df[cols] = df[cols].apply(parse)
print (df)
     date   date1
0   95000   95000
1  155000  155000

If format is never change is possible extract digits by Series.str.extractall, then reshape by Series.unstack, convert to integers and multiple by unit Series, last sum values.

Solution is inspired by deleted answer.

df = pd.DataFrame({'date':['0D_0H_1M_35S_0MS','0D_0H_2M_35S_0MS'],
                   'date1':['0D_0H_1M_35S_0MS','0D_0H_2M_35S_0MS']})

cols = ['date','date1']

def parse(x):
    a = pd.Series([86400000, 3600000, 60000, 1000, 1])
    return x.str.extractall('(\d+)')[0].unstack().astype(int).mul(a).sum(axis=1)
    
df[cols] = df[cols].apply(parse)

print (df)
     date   date1
0   95000   95000
1  155000  155000

Upvotes: 3

Related Questions