Claudia
Claudia

Reputation: 87

Python's regex star quantifier not working as expected

I'm trying to use regular expressions to select only groups of words within quotation marks.

Example.

Input:

this is 'a sentence' with less 'than twenty words'

Output:

['a sentence', 'than twenty words']

The regex I'm using is:

'\'[\w]+[ ]+[[\w]+[ ]+]*[\w]+\''

But it's just returning the 'than twenty words'. In fact, it only returns the strings with two spaces.

Upvotes: 1

Views: 811

Answers (4)

Saeed Ghareh Daghi
Saeed Ghareh Daghi

Reputation: 1205

This will deliver the strings between quotation marks, including words and spaces.

import re
st = "this is 'a sentence' with less 'than twenty words'"
re.findall(r"\'([\w|\s]+)\'", st)

Upvotes: 1

Pedro Lobito
Pedro Lobito

Reputation: 99001

Late answer, but you can use:

import re
string = "this is 'a sentence' with less 'than twenty words'"
result = re.findall("'(.*?)'", string)
print result
# ['a sentence', 'than twenty words']

Python Demo
Regex Demo

Upvotes: 0

Ahasanul Haque
Ahasanul Haque

Reputation: 11144

Try this:

import re
re.findall(r"\'(\s*\w+\s+\w[\s\w]*)\'", input_string)

Demo

Upvotes: 3

Thierry Lathuille
Thierry Lathuille

Reputation: 24280

import re 
sentence = "this is 'a sentence' with less 'than twenty words' and a 'lonely' word"
regex = re.compile(r"(?<=')\w+(?:\s+\w+)+(?=')")
regex.findall(sentence)
# ['a sentence', 'than twenty words']

We want to capture strings starting and ending with quotes, without capturing them, so we use a positive lookbehind assertion (?<=') before, and a lookahead assertion (?=') afterwards.

Inside the quotes, we want to have at least one word, followed by at least one group of space and word. We don't want it to be a capturing group, otherwise findall would return only this group, so we make it non-catching by using (?:....).

Upvotes: 0

Related Questions