Regex capture numbers based on preceding text

Question

Consider the following text:

one="ambience: 5 comments:xxx food: 4 comments: xxxx service: 3 
comments: xxx" 

two="ambience: 5 comments:xxx food:   comments: since nothing to eat
after 8 pm service: 4  comments: xxxx "

three="ambience: it is a 5 comments:xxx food: a 6   comments: since nothing to eat
after 8 pm service: a 4  comments: xxxx "

for string one

    re.findall(ur'(ambience|food|service)[\s\S]*?(\d)',one,re.UNICODE)
    [('ambience', '5'), ('food', '4'), ('service', '3')]

for string two the result is

[('ambience', '5'), ('food', '8'), ('service', '4')]

since this logic purely looks for the first digit after the specific text it is fairly misleading when rating is skipped intentionally or otherwise .

If the consecutive rating is missed how do i get regex return the rating as NaN ?

[('ambience', '5'), ('food', 'NaN'), ('service', '4')]

I also have a variant using look-ahead and look-behind anchors

re.findall(ur'(?<=food)[\s]*:[^\d]*([\d[.|-|\/|-]+)[^\d]*(?=comment[s]*[\s]*:)',one,re.UNICODE)

Regex capture numbers based on preceding text

Answers (1)

Related Questions