Reputation: 87
I'm trying to use regular expressions to select only groups of words within quotation marks.
Example.
Input:
this is 'a sentence' with less 'than twenty words'
Output:
['a sentence', 'than twenty words']
The regex I'm using is:
'\'[\w]+[ ]+[[\w]+[ ]+]*[\w]+\''
But it's just returning the 'than twenty words'. In fact, it only returns the strings with two spaces.
Upvotes: 1
Views: 811
Reputation: 1205
This will deliver the strings between quotation marks, including words and spaces.
import re
st = "this is 'a sentence' with less 'than twenty words'"
re.findall(r"\'([\w|\s]+)\'", st)
Upvotes: 1
Reputation: 99001
Late answer, but you can use:
import re
string = "this is 'a sentence' with less 'than twenty words'"
result = re.findall("'(.*?)'", string)
print result
# ['a sentence', 'than twenty words']
Upvotes: 0
Reputation: 11144
Try this:
import re
re.findall(r"\'(\s*\w+\s+\w[\s\w]*)\'", input_string)
Upvotes: 3
Reputation: 24280
import re
sentence = "this is 'a sentence' with less 'than twenty words' and a 'lonely' word"
regex = re.compile(r"(?<=')\w+(?:\s+\w+)+(?=')")
regex.findall(sentence)
# ['a sentence', 'than twenty words']
We want to capture strings starting and ending with quotes, without capturing them, so we use a positive lookbehind assertion (?<=')
before, and a lookahead assertion (?=')
afterwards.
Inside the quotes, we want to have at least one word, followed by at least one group of space and word. We don't want it to be a capturing group, otherwise findall
would return only this group, so we make it non-catching by using (?:....)
.
Upvotes: 0