smaili
smaili

Reputation: 1235

Simple python regex not working as intended

I'm trying to extract out the number before the - and the rest of the string after it, but it's not able to extract out both. Here's the output from the interactive terminal:

>>> a = '#232 - Hello There'
>>> re.findall('#(.*?) - (.*?)', a)
[('232', '')]

Why is my regex not working properly?

Upvotes: 1

Views: 1076

Answers (3)

Eric
Eric

Reputation: 97575

Your regex is fine, you're just using the wrong function from re. The following matches things correctly:

m = re.fullmatch('#(.*?) - (.*?)', a)

Upvotes: 0

heemayl
heemayl

Reputation: 41987

.*? is non-greedy i.e. it will match the smallest substring, you need the greedy version i.e. .* (matches longest substring) for the latter one:

In [1143]: a = '#232 - Hello There'                                                                                                                                                                         

In [1144]: re.findall('#(.*?) - (.*?)', a)                                                                                                                                                                  
Out[1144]: [('232', '')]

In [1145]: re.findall('#(.*?) - (.*)', a)                                                                                                                                                                   
Out[1145]: [('232', 'Hello There')]

But you should use str methods to process such simple cases e.g. using str.split with splitting on -:

In [1146]: a.split(' - ')                                                                                                                                                                      
Out[1146]: ['#232', 'Hello There']

With str.partition on - and slicing:

In [1147]: a.partition(' - ')[::2]                                                                                                                                                                          
Out[1147]: ('#232', 'Hello There')

Upvotes: 7

Emma
Emma

Reputation: 27723

This expression might likely extract those desired values:

([0-9]+)\s*-\s*(.*)

Demo

Test

import re

print(re.findall("([0-9]+)\s*-\s*(.*)", "#232 - Hello There"))

Output

[('232', 'Hello There')]

Upvotes: 0

Related Questions