Reputation: 787
I'm looking for a function that returns a map[string]interface{}
where interface{}
can be a slice, a a map[string]interface{}
or a value.
My use case is to parse WKT geometry like the following and retrieves point values; Example for a donut polygon:
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
The regex (I voluntary set \d that matches only integers for readability purpose):
(POLYGON \(
(?P<polygons>\(
(?P<points>(?P<point>(\d \d), ){3,})
(?P<last_point>\d \d )\),)*
(?P<last_polygon>\(
(?P<points>(?P<point>(\d \d), ){3,})
(?P<last_point>\d \d)\))\)
)
I have a function (copied from SO) that retrieves some informations but it's not that good for nested groups and list of groups:
func getRegexMatchParams(reg *regexp.Regexp, url string) (paramsMap map[string]string) {
match := reg.FindStringSubmatch(url)
paramsMap = make(map[string]string)
for i, name := range reg.SubexpNames() {
if i > 0 && i <= len(match) {
paramsMap[name] = match[i]
}
}
return match
}
It seems that the group point
gets only 1 point.
example on playground
[EDIT] The result I want is something like this:
map[string]interface{}{
"polygons": map[string]interface{} {
"points": []interface{}{
{map[string]string{"point": "0 0"}},
{map[string]string{"point": "0 10"}},
{map[string]string{"point": "10 10"}},
{map[string]string{"point": "10 0"}},
},
"last_point": "0 0",
},
"last_polygon": map[string]interface{} {
"points": []interface{}{
{map[string]string{"point": "3 3"}},
{map[string]string{"point": "3 7"}},
{map[string]string{"point": "7 7"}},
{map[string]string{"point": "7 3"}},
},
"last_point": "3 3",
}
}
So I can use it further for different purposes like querying databases and validate that last_point = points[0] for each polygon.
Upvotes: 0
Views: 542
Reputation:
Try to add some whitespace to the regex.
Also note that this engine won't retain all capture group values that are
within a quantified outer grouping like (a|b|c)+
where this group will only contain the last a or b or c it finds.
And, your regex can be reduced to this
(POLYGON\s*\((?P<polygons>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\)(?:\s*,\s*|\s*\)))+)
https://play.golang.org/p/rLaaEa_7GX
The original:
(POLYGON\s*\((?P<polygons>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\),)*(?P<last_polygon>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\))\s*\))
https://play.golang.org/p/rZgJYPDMzl
See below for what the groups contain.
( # (1 start)
POLYGON \s* \(
(?P<polygons> # (2 start)
\( \s*
(?P<points> # (3 start)
(?P<point> # (4 start)
\s*
( \d+ \s+ \d+ ) # (5)
\s*
,
){3,} # (4 end)
) # (3 end)
\s*
(?P<last_point> \d+ \s+ \d+ ) # (6)
\s* \),
)* # (2 end)
(?P<last_polygon> # (7 start)
\( \s*
(?P<points> # (8 start)
(?P<point> # (9 start)
\s*
( \d+ \s+ \d+ ) # (10)
\s*
,
){3,} # (9 end)
) # (8 end)
\s*
(?P<last_point> \d+ \s+ \d+ ) # (11)
\s* \)
) # (7 end)
\s* \)
) # (1 end)
Input
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
Output
** Grp 0 - ( pos 0 , len 65 )
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
** Grp 1 - ( pos 0 , len 65 )
POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
** Grp 2 [polygons] - ( pos 9 , len 30 )
(0 0, 0 10, 10 10, 10 0, 0 0),
** Grp 3 [points] - ( pos 10 , len 23 )
0 0, 0 10, 10 10, 10 0,
** Grp 4 [point] - ( pos 27 , len 6 )
10 0,
** Grp 5 - ( pos 28 , len 4 )
10 0
** Grp 6 [last_point] - ( pos 34 , len 3 )
0 0
** Grp 7 [last_polygon] - ( pos 39 , len 25 )
(3 3, 3 7, 7 7, 7 3, 3 3)
** Grp 8 [points] - ( pos 40 , len 19 )
3 3, 3 7, 7 7, 7 3,
** Grp 9 [point] - ( pos 54 , len 5 )
7 3,
** Grp 10 - ( pos 55 , len 3 )
7 3
** Grp 11 [last_point] - ( pos 60 , len 3 )
3 3
Possible Solution
It's not impossible. It just takes a few extra steps.
(As an aside, isn't there a library for WKT that can parse this for you ?)
Now, I don't know your language capabilities, so this is just a general approach.
1. Validate the form you're parsing.
This will validate and return all polygon sets as a single string in All_Polygons
group.
Target POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
POLYGON\s*\((?P<All_Polygons>(?:\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))(?:\s*,\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))*)\s*\)
** Grp 1 [All_Polygons] - ( pos 9 , len 55 )
(0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)
2. If 1 was successful, set up a loop match using the output of All_Polygons
string.
Target (0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)
(?:\(\s*(?P<Single_Poly_All_Pts>\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,})\s*\))
This step is equivalent of a find all type of match. It should match successive values of all the points of a single polygon, returned in Single_Poly_All_Pts
group string.
This will give you these 2 separate matches, which can be put into a temp array having 2 value strings:
** Grp 1 [Single_Poly_All_Pts] - ( pos 1 , len 27 )
0 0, 0 10, 10 10, 10 0, 0 0
** Grp 1 [Single_Poly_All_Pts] - ( pos 31 , len 23 )
3 3, 3 7, 7 7, 7 3, 3 3
3. If 2 was successful, set up a loop match using the temp array output of step 2.
This will give you the individual points of each polygon.
(?P<Single_Point>\d+\s+\d+)
Again this is a loop match (or a find all type of match). For each array element
(Polygon), this will produce the individual points.
Target[element 1] 0 0, 0 10, 10 10, 10 0, 0 0
** Grp 1 [Single_Point] - ( pos 0 , len 3 )
0 0
** Grp 1 [Single_Point] - ( pos 5 , len 4 )
0 10
** Grp 1 [Single_Point] - ( pos 11 , len 5 )
10 10
** Grp 1 [Single_Point] - ( pos 18 , len 4 )
10 0
** Grp 1 [Single_Point] - ( pos 24 , len 3 )
0 0
And,
Target[element 2] 3 3, 3 7, 7 7, 7 3, 3 3
** Grp 1 [Single_Point] - ( pos 0 , len 3 )
3 3
** Grp 1 [Single_Point] - ( pos 5 , len 3 )
3 7
** Grp 1 [Single_Point] - ( pos 10 , len 3 )
7 7
** Grp 1 [Single_Point] - ( pos 15 , len 3 )
7 3
** Grp 1 [Single_Point] - ( pos 20 , len 3 )
3 3
Upvotes: 2