Python regex to get numbers between optional strings

Question

I've the following texts:

"Start 2 h 30 m End 3 h 20 m"
"Start 30 m End 10 m"

How can I extract the numbers alone so that output is

|Start_h|Start_m|End_h|End_m|
|-------|-------|-----|-----|
|      2|     30|    3|   20|
|    Nan|     30|  Nan|   10|

My attempt at this was using string extract in pandas.

df['time'].str.extract(r'Start (\w+) h (\w+) m ') but this doesn't give me "m" alone if "h" is not present

DYZ · Accepted Answer

Here's a more robust pattern:

TIME = r"(?:(\d+) h )?(\d+) m" # Optional hr, required min
PATTERN = "Start {} End {}".format(TIME, TIME)
df['time'].str.extract(PATTERN)
#     0   1    2   3
#0    2  30    3  20
#1  NaN  30  NaN  10

Note that you need a separate matching group for each column, the total of four groups.

Python regex to get numbers between optional strings

Answers (2)

Related Questions