How to replace regex pattern hh:mm:ss with hh:mm in python

Question

I'm sorry if this is a simple question but I have a csv file with different time formats as follows: hh:mm and hh:mm:ss

An extract of the file looks like this:

column_name
00:00:00
01:00:00
05:00
02:00:00
03:00:00
06:00
...
23:00:00
00:00:00

I have the following regex expression to match all those times [0-9]{2}[:][0-9]{2}[:][0-9]{2}

My question is how do I get rid of the colon and the seconds (:ss) for the hh:mm:ss formats but keep the hh:mm format the same. Essentially changing only one of the formats from hh:mm:ss to hh:mm so that they are all in the format hh:mm in a python script?

I tried using this line:

df['column_name'] = [re.sub(r'[0-9]{2}[:][0-9]{2}[:][0-9]{2}', r'[0-9]{2}[:][0-9]{2}', str(x)) for x in df['column_name']]

But this changed all the times to this [0-9]{2}[:][0-9]{2}

I also tried just using slicing such as [:-3] but I could not get it to work:

df['column_name'] = [re.sub(r'[0-9]{2}[:][0-9]{2}[:][0-9]{2}', [:-3], str(x)) for x in df['column_name']]

Any help would be much appreciated, Thank you

EDIT I previously thought @RJ Adriaansen answer was correct but upon closer inspection it also changed the hh:mm formats which need to be kept the same, which means that it had to use regex, therefore @Haleemur Ali answer was the one that solved it for me.

Haleemur Ali · Accepted Answer

if you want to use regex to tackle this, use capture groups:

df.column_name.str.replace(r'(\d{2}):(\d{2}):(\d{2})', r'\1:\2', regex=True)

to define a capture group, you enclose it in parenthesis. you can then refer to the captured string using the group index (starting with one).

also, important to note that since the groups are referred as \, a raw regex string needs to be used.

How to replace regex pattern hh:mm:ss with hh:mm in python

Answers (2)

Related Questions