Reputation: 195
I want to scrape data from this site: https://ispu.mgipu.hr/ The idea is to extract little dots you can see after zooming in (their coordinates).
Here is the procedure (after zooming to 1:5000 at least) I want to automate:
If you check the network during the above process you can see a new XHR request appear: 'obuhvat'. It is a POST request and it has only one payload element (POLYGON with coordinates).
I tried to make these POST requests directly to the same site but I always get the 400 response.
It only works if I first do everything in the browser and copy-paste the same payload to request.
Here is my try:
library(httr)
# params
url <- 'https://ispu.mgipu.hr/geo/api/info-lokacija/obuhvat'
ua <- 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.146 Safari/537.36'
# this works becuase I have already draw these polygon in app manually
data <- list(
geom = "POLYGON((466131.6322632646 5062601.925203851,466141.4322828647 5062489.924979851,466292.63258526527 5062489.924979851,466287.03257406526 5062596.325192652,466131.6322632646 5062601.925203851))"
)
req <- POST(url, body = data, encode = 'json', user_agent(ua))
print(req$status_code)
# this doesnt work, I have just change the one number in data
data <- list(
geom = "POLYGON((466131.6322632647 5062601.925203851,466141.4322828647 5062489.924979851,466292.63258526527 5062489.924979851,466287.03257406526 5062596.325192652,466131.6322632646 5062601.925203851))"
)
req <- POST(url, body = data, encode = 'json', user_agent(ua))
print(req$status_code)
Upvotes: 1
Views: 128
Reputation: 84465
So, this is not the complete answer I would like to give but might be a starter for 10.
Thinking back to school geometry I remembered covering rules for polygons and hypothesised that perhaps your altering the co-ordinates was violating a rule.
Take your given examples -
Working:
POLYGON((470883.3817753086 4925329.690667927,470757.9415158831 4925207.6103212265,470924.8218453713 4925156.090320726,470883.3817753086 4925329.690667927))
Not working:
POLYGON((466131.6322632647 5062601.925203851,466141.4322828647 5062489.924979851,466292.63258526527 5062489.924979851,466287.03257406526 5062596.325192652,466131.6322632646 5062601.925203851)))
I decided to pass via a package that could check at least some rules of polygons and give me back more meaningful info than an http code.
library(sf)
#> Warning: package 'sf' was built under R version 4.0.3
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
x = st_as_sfc("POLYGON((466131.6322632646 5062601.925203851,466141.4322828647 5062489.924979851,466292.63258526527 5062489.924979851,466287.03257406526 5062596.325192652,466131.6322632646 5062601.925203851)))")
st_is_valid(x, reason = TRUE)
#> [1] "Valid Geometry"
# your altered
x = st_as_sfc("POLYGON((466131.6322632647 5062601.925203851,466141.4322828647 5062489.924979851,466292.63258526527 5062489.924979851,466287.03257406526 5062596.325192652,466131.6322632646 5062601.925203851)))")
st_is_valid(x, reason = TRUE)
#> Error in CPL_geos_is_valid_reason(x): Evaluation error: IllegalArgumentException: Points of LinearRing do not form a closed linestring.
Created on 2021-02-09 by the reprex package (v0.3.0)
We now get some useful info about the validity of your adjustment according to the rules applied within sf package.
Error in CPL_geos_is_valid_reason(x): Evaluation error: IllegalArgumentException: Points of LinearRing do not form a closed linestring.
Googling that error led me to an answer by @yellowcap:
A valid polygon or multipolygon must have identical start and endpoints.
Makes sense. You altered the start point so it didn't match the end point meaning you didn't have a closed polygon.
# start 466131.6322632647 5062601.925203851
# end 466131.6322632646 5062601.925203851
Testing with altering a different set of co-ordinates (not first|last); Keep start and end pairs same and alter second pair
# 466141.4322828647 5062489.924979851 >> 566141.4322828647 6062489.924979851
x = st_as_sfc("POLYGON((466131.6322632647 5062601.925203851,566141.4322828647 6062489.924979851,466292.63258526527 5062489.924979851,466287.03257406526 5062596.325192652,466131.6322632647 5062601.925203851))")
st_is_valid(x, reason = TRUE)
# [1] "Valid Geometry"
However, this didn't guarantee being accepted within your POST. So, I am imagining some other constraints that my application of sf hasn't dealt with. I simply don't know enough about GIS to work this up but maybe someone on gis.stackexchange.com might?
I had a look at this for some background and it listed:
21.1. What is Validity
Validity is most important for polygons, which define bounded areas and require a good deal of structure. Lines are very simple and cannot be invalid, nor can points.
Some of the rules of polygon validity feel obvious, and others feel arbitrary (and in fact, are arbitrary).
- Polygon rings must close.
- Rings that define holes should be inside rings that define exterior boundaries.
- Rings may not self-intersect (they may neither touch nor cross one another).
- Rings may not touch other rings, except at a point.
So, I wondered whether the sf package offered a quick solve with:
st_make_valid()
Sadly, I couldn't get it to fix your failing case.
I then started falling down a big GIS hole with little clue where I was going. I started looking at QGIS to see if I could edit geometries and work out more but didn't get far.
That's my hypothesis, with some evidence, a suggestion for where to ask/look next and a "currently defeated but interested so will revisit" exit........
Reading:
Upvotes: 1