FlyingZebra1
FlyingZebra1

Reputation: 1346

Python RegEx extract text between two patterns

I am trying to pull out values for lat and lng for the following:

coordinates = 
[<div class="store-map">\n<div id="map" style="width: 100%; height: 400px;"></div>\n<script>\r\n                function initMap() {\r\n                    var myLatLng = {\r\n                        lat: 42.050994,\r\n                        lng: -88.077711                    };\r\n\r\n     

However, when I apply this regex -

found = re.search('lat:(.*),', coordinates,).group(1)  

Everything after "lat:" is returned.
However, the desired result is just the number, that stops as soon as it reaches the comma. This is odd to me, because even rubular shows that code should work. Any ideas on what I could be doing wrong here?

P.S. I have spent a bit of time, and looked at all related solutions on stackoverflow, however - no dice.

Upvotes: 0

Views: 1217

Answers (3)

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

The right way with re.findall function:

import re

coordinates = '[<div class="store-map">\n<div id="map" style="width: 100%; height: 400px;"></div>\n<script>\r\n                function initMap() {\r\n                    var myLatLng = {\r\n                        lat: 42.050994,\r\n                        lng: -88.077711                    };\r\n\r\n '
result = re.findall(r'\b(?:lat|lng): -?\d+\.\d+', coordinates)

print(result)

The output:

['lat: 42.050994', 'lng: -88.077711']

Upvotes: 3

Martin Evans
Martin Evans

Reputation: 46759

Use the following to extract the two values:

import re

text = """[<div class="store-map">\n<div id="map" style="width: 100%; height: 400px;"></div>\n<script>\r\n                function initMap() {\r\n                    var myLatLng = {\r\n                        lat: 42.050994,\r\n                        lng: -88.077711                    };\r\n\r\n     """

lat, lng = map(float, re.findall(r'(?:lat|lng):\s+([0-9.-]*?)[, ]', text))
print lat, lng

Giving you two floats as:

42.050994 -88.077711

Upvotes: 1

Dmitry Egorov
Dmitry Egorov

Reputation: 9650

This is because .* is greedy meaning it would match everything up to the last comma. Change it to .*?:

lat:(.*?),
       ^
   add this

Upvotes: 0

Related Questions