Michael De Keyser
Michael De Keyser

Reputation: 797

Parsing a string into a list of dicts

I have a string that looks like this:

POLYGON ((148210.445767647 172418.761192525, 148183.930888667 172366.054787545, 148183.866770629 172365.316772032, 148184.328078148 172364.737139913, 148220.543522168 172344.042601933, 148221.383518338 172343.971823159), (148221.97916844 172344.568316375, 148244.61381946 172406.651932395, 148244.578100039 172407.422441673, 148244.004662562 172407.938319453, 148211.669446582 172419.255646473, 148210.631989339 172419.018894911, 148210.445767647 172418.761192525))

I can easily strip POLYGON out of the string to focus on the numbers but I'm kinda wondering what would be the easiest/best way to parse this string into a list of dict.

The first parenthesis (right after POLYGON) indicates that multiple elements can be provided (separated by a comma ,).

So each pair of numbers is to supposed to be x and y.

I'd like to parse this string to end up with the following data structure (using python 2.7):

list [ //list of polygons
  list [ //polygon n°1
    dict { //polygon n°1's first point
      'x': 148210.445767647, //first number
      'y': 172418.761192525 //second number
    },
    dict { //polygon n°1's second point
      'x': 148183.930888667,
      'y': 148183.930888667
    },
    ... // rest of polygon n°1's points
  ], //end of polygon n°1
  list [ // polygon n°2
    dict { // polygon n°2's first point
      'x': 148221.9791684,
      'y': 172344.568316375
    },
    ... // rest of polygon n°2's points
  ] // end of polygon n°2
] // end of list of polygons

Polygons' number of points is virtually infinite.
Each point's numbers are separated by a blank.

Do you guys know a way to do this in a loop or any recursive way ?

PS: I'm kind of a python beginner (only a few months under my belt) so don't hesitate to explain in details. Thank you!

Upvotes: 0

Views: 181

Answers (3)

Sardorbek Imomaliev
Sardorbek Imomaliev

Reputation: 15390

Lets say u have a string that looks like this

my_str = 'POLYGON ((148210.445767647 172418.761192525, 148183.930888667 172366.054787545, 148183.866770629 172365.316772032, 148184.328078148 172364.737139913, 148220.543522168 172344.042601933, 148221.383518338 172343.971823159), (148221.97916844 172344.568316375, 148244.61381946 172406.651932395, 148244.578100039 172407.422441673, 148244.004662562 172407.938319453, 148211.669446582 172419.255646473, 148210.631989339 172419.018894911, 148210.445767647 172418.761192525))'

my_str = my_str.replace('POLYGON ', '')
coords_groups = my_str.split('), (')

for coords in coords_groups:
    coords.replace('(', '').replace(')', '')
    coords_list = coords.split(', ')
    coords_list2 = []
    for item in coords_list:
        item_split = item.split(' ')
        coords_list2.append({'x', item_split[0], 'y': item_split[1]})

I think this should help a little

All u need now is a way to get info between parenthesis, this should help Regular expression to return text between parenthesis

UPDATE updated code above thanks to another answer by https://stackoverflow.com/users/2635860/mccakici , but this works only if u have structure of string as u have said in your question

Upvotes: 1

mccakici
mccakici

Reputation: 550

can you try?

import ast

POLYGON = '((148210.445767647 172418.761192525, 148183.930888667 172366.054787545, 148183.866770629 172365.316772032, 148184.328078148 172364.737139913, 148220.543522168 172344.042601933, 148221.383518338 172343.971823159), (148221.97916844 172344.568316375, 148244.61381946 172406.651932395, 148244.578100039 172407.422441673, 148244.004662562 172407.938319453, 148211.669446582 172419.255646473, 148210.631989339 172419.018894911, 148210.445767647 172418.761192525))'
new_polygon = '(' + POLYGON.replace(', ', '),(').replace(' ', ',') + ')'


data = ast.literal_eval(new_polygon)
result_list = list()
for items in data:
    sub_list = list()
    for item in items:
        sub_list.append({
            'x': item[0],
            'y': item[1]
        })
    result_list.append(sub_list)

print result_list

Upvotes: 1

GWW
GWW

Reputation: 44093

The data structure you have defining your Polygon object looks very similar to a python tuple declaration. One option, albeit a bit hacky would be to use python's AST parser.

You would have to strip off the POLYGON part and this solution may not work for other declarations that are more complex.

import ast
your_str = "POLYGON (...)"
# may be better to use a regex to split off the class part 
# if you have different types
data = ast.literal_eval(your_str.replace("POLYGON ",""))
x, y = data
#now you can zip the two x and y pairs together or make them into a dictionary

Upvotes: 2

Related Questions