S_Scouse
S_Scouse

Reputation: 231

Selecting part of a string in Pandas Series

I have a string as follows: 2020-01-01T16:30.00 - 1.00. I want to select the string that is between T and - , i.e. I want to be able to select 16:30.00 out of the whole string and convert it to a float. Any help is appreciated.

Upvotes: 1

Views: 688

Answers (1)

Ric S
Ric S

Reputation: 9247

If you have a pandas Series s like this

import pandas as pd
s = pd.Series(["2020-01-01T16:30.00 - 1.00", "2020-12-04T00:25.00 - 14.00"])

you can use

s.str.replace(".+T", "").str.replace(" -.+", "")
# 0    16:30.00
# 1    00:25.00
# dtype: object

Basically, you first substitute with an empty string everything that precedes the T and the T itself. Then, you substitute with an empty string the part starting with - (there is a whitespace before the small dash).


Another option is to use groups of regular expressions to match particular patterns and select only one of the groups (in this case the second, .+)

import re
s.apply(lambda x: re.match("(.+T)(.+)( -.+)", x).group(2))
# 0    16:30.00
# 1    00:25.00
# dtype: object

Upvotes: 1

Related Questions