jqwerty
jqwerty

Reputation: 37

How can I read a file as nested lists of coordinates for multiple polygons?

I have a file with many sections like the following:

[40.742742,-73.993847]
[40.739389,-73.985667]
[40.74715499999999,-73.97992]
[40.750573,-73.988415]
[40.742742,-73.993847]

[40.734706,-73.991915]
[40.736917,-73.990263]
[40.736104,-73.98846]
[40.740315,-73.985263]
[40.74364800000001,-73.993353]
[40.73729099999999,-73.997988]
[40.734706,-73.991915]

[40.729226,-74.003463]
[40.7214529,-74.006038]
[40.717745,-74.000389]
[40.722299,-73.996634]
[40.725291,-73.994413]
[40.729226,-74.003463]
[40.754604,-74.007836]
[40.751289,-74.000649]
[40.7547179,-73.9983309]
[40.75779,-74.0054339]
[40.754604,-74.007836]

I need to read in each of these sections as a list of pairs of coordinates (Each section being separated by an extra \n).

In a similar file I have (same except there are no extra newline breaks), I am drawing one polygon from the whole file. I can use the following code to read in the coordinates and draw it in matplotlib:

mVerts = []
with open('Manhattan_Coords.txt') as f:
    for line in f:
        pair = [float(s) for s in line.strip()[1:-1].split(", ")]
        mVerts.append(pair)

plt.plot(*zip(*mVerts))
plt.show()

How can I accomplish the same task, except with many more than 1 polygon, each polygon in my file separated by an extra newline?

Upvotes: 1

Views: 637

Answers (3)

roippi
roippi

Reputation: 25954

Here's my personal favorite way to "chunk" a file into groups of things that are separated by whitespace:

from itertools import groupby

def chunk_groups(it):
     stripped_lines = (x.strip() for x in it)
     for k, group in groupby(stripped_lines, bool):
         if k:
             yield list(group)

And I'd recommend ast.literal_eval to turn those string-representations of lists into actual python lists:

from ast import literal_eval

with open(filename) as f:
     result = [[literal_eval(li) for li in chunk] for chunk in chunk_groups(f)]

Gives:

result
Out[66]: 
[[[40.742742, -73.993847],
  [40.739389, -73.985667],
  [40.74715499999999, -73.97992],
  [40.750573, -73.988415],
  [40.742742, -73.993847]],
 [[40.734706, -73.991915],
  [40.736917, -73.990263],
  [40.736104, -73.98846],
  [40.740315, -73.985263],
  [40.74364800000001, -73.993353],
  [40.73729099999999, -73.997988],
  [40.734706, -73.991915]],
 [[40.729226, -74.003463],
  [40.7214529, -74.006038],
  [40.717745, -74.000389],
  [40.722299, -73.996634],
  [40.725291, -73.994413],
  [40.729226, -74.003463],
  [40.754604, -74.007836],
  [40.751289, -74.000649],
  [40.7547179, -73.9983309],
  [40.75779, -74.0054339],
  [40.754604, -74.007836]]]

Upvotes: 4

Joe Kington
Joe Kington

Reputation: 284622

There are a lot of nifty approaches taken in the answers already posted. There's nothing wrong with any of them.

However, there's also nothing wrong with taking the obvious-but-readable approach.

On a side note, you seem to be working with geographic data. This sort of format is something you'll run into all of the time, and the segment delimiter often isn't something as obvious as an extra newline. (There are a lot of fairly bad ad-hoc "ascii export" formats out there, particularly in obscure proprietary software. For example, one common format uses an F at the end of the last line in a segment as the delimiter (i.e. 1.0 2.0F). Plenty of others don't use a delimiter at all, and require you to start a new segment/polygon if you're more than "x" distance away from the last point.) Furthermore, these things often wind up being multi-GB ascii files, so reading the entire thing into memory can be impractical.


My point is: Regardless of the approach you choose, make sure you understand it. You're going to be doing this again, and it's going to be just different enough to be difficult to generalize. You absolutely should learn libraries like itertools well, but make sure you fully understand the functions you're calling.


Here's one version of the "obvious-but-readable" approach. It's more verbose, but no one is going to be left scratching their heads as to what it does. (You could write this same logic several slightly different ways. Use what makes the most sense to you.)

import matplotlib.pyplot as plt

def polygons(infile):
    group = []
    for line in infile:
        line = line.strip()
        if line:
            coords = line[1:-1].split(',')
            group.append(map(float, coords))
        else:
            yield group
            group = []
    else:
        yield group

fig, ax = plt.subplots()
ax.ticklabel_format(useOffset=False)

with open('data.txt', 'r') as infile:
    for poly in polygons(infile):
        ax.plot(*zip(*poly))

plt.show()

enter image description here

Upvotes: 2

Veedrac
Veedrac

Reputation: 60147

A slight variation on roippi's idea, using json instead of ast,

import json
from itertools import groupby

with open(FILE, "r") as coodinates_file:
    grouped = groupby(coodinates_file, lambda line: line.isspace())
    groups = (group for empty, group in grouped if not empty)

    polygons = [[json.loads(line) for line in group] for group in groups]
from pprint import pprint
pprint(polygons)
#>>> [[[40.742742, -73.993847],
#>>>   [40.739389, -73.985667],
#>>>   [40.74715499999999, -73.97992],
#>>>   [40.750573, -73.988415],
#>>>   [40.742742, -73.993847]],
#>>>  [[40.734706, -73.991915],
#>>>   [40.736917, -73.990263],
#>>>   [40.736104, -73.98846],
#>>>   [40.740315, -73.985263],
#>>>   [40.74364800000001, -73.993353],
#>>>   [40.73729099999999, -73.997988],
#>>>   [40.734706, -73.991915]],
#>>>  [[40.729226, -74.003463],
#>>>   [40.7214529, -74.006038],
#>>>   [40.717745, -74.000389],
#>>>   [40.722299, -73.996634],
#>>>   [40.725291, -73.994413],
#>>>   [40.729226, -74.003463],
#>>>   [40.754604, -74.007836],
#>>>   [40.751289, -74.000649],
#>>>   [40.7547179, -73.9983309],
#>>>   [40.75779, -74.0054339],
#>>>   [40.754604, -74.007836]]]

Upvotes: 2

Related Questions