Extracting coordinates from a string

Question

Consider the following: "MULTILINESTRING((10 10,10 40),(40 40,30 30,40 20,30 10))".
I want to transform this into: [[10,10],[10,40],[40,40],[30,30],[40,20],[30,10]].

My solution
I use the functions split() and replace()to format this. I get some dirty code and probably not the most efficient like my_str.split('((')[1].split('))')[1]...etc

Because I'm doing this on a huge dataset, I'm looking for an efficient way to do it.

cs95 · Accepted Answer

If you're looking for clean code that doesn't do too much, I'd recommend a two step process involving the re module—

split your string into smaller chunks on comma using str.split
for each chunk, extract coordinates with re.findall

For performance, I'd recommend pre-compiling a regex-pattern using re.compile, since we'll be calling it repeatedly inside a loop.

>>> import re
>>> p = re.compile(r'\d+(?:\.\d+)?')
>>> [list(map(int, p.findall(x)) for x in mstring.split(',')]
[[10, 10], [10, 40], [40, 40], [30, 30], [40, 20], [30, 10]]

Note, mstring is your string data.

Details

\d+    # match one or more digits
(?:    # specify non-capturing group
\.     # literal period/decimal
\d+    
)?     # optional

Semantically, this regex will match integers OR floats (Ajax1234's solution currently only accounts for integers, and is guaranteed to be finish searching in fewer cycles).

Extracting coordinates from a string

Answers (2)

Related Questions