Boo Yan Jiong
Boo Yan Jiong

Reputation: 2731

Golang complex regex with FindAllStringSubmatch

I have a superheroes string, all of them have names, but not all of them have attributes.

It has a format of ⛦name⛯attrName☾attrData☽, where the attrName☾attrData☽ is optional.

So, the superheroes string is:

⛦superman⛯shirt☾blue☽⛦joker⛯⛦spiderman⛯age☾15yo☽girlFriend☾Cindy☽

I want to use Regex to extract the string, and populates the result into a slice of map, as such:

[ {name: superman, shirt: blue},
  {name: joker},
  {name: spiderman, age: 15yo, girlFriend: Cindy} ]

I can't get it done in Go playground. I use the regex ⛦(\\w+)⛯(?:(\\w+)☾(\\w+)☽)*, but it only can capture single attribute, i.e. regex unable to capture the age attributes.

My code is:

func main() {
    re := regexp.MustCompile("⛦(\\w+)⛯(?:(\\w+)☾(\\w+)☽)*")
    fmt.Printf("%q\n", re.FindAllStringSubmatch("⛦superman⛯shirt☾blue☽⛦joker⛯⛦spiderman⛯age☾15yo☽girlFriend☾Cindy☽", -1))
}

The Go Playground code is at here: https://play.golang.org/p/Epv66LVwuRK

The run result is:

[
    ["⛦superman⛯shirt☾blue☽" "superman" "shirt" "blue"]
    ["⛦joker⛯" "joker" "" ""]
    ["⛦spiderman⛯age☾15yo☽girlFriend☾Cindy☽" "spiderman" "girlFriend" "Cindy"]
]

The age is missing, any idea?

Upvotes: 1

Views: 345

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You cannot capture arbitrary number of substrings with a single capturing group. You need to match the whole record first, and then match the subparts of it with another regex.

See an example:

package main

import (
    "fmt"
    "regexp"
)

func main() {

    str := "⛦superman⛯shirt☾blue☽⛦joker⛯⛦spiderman⛯age☾15yo☽girlFriend☾Cindy☽"

    re_main := regexp.MustCompile(`⛦(\w+)⛯((?:\w+☾\w+☽)*)`)
    re_aux := regexp.MustCompile(`(\w+)☾(\w+)☽`)
    for _, match := range re_main.FindAllStringSubmatch(str, -1) {
        fmt.Printf("%v\n", match[1])
        for _, match_aux := range re_aux.FindAllStringSubmatch(match[2], -1) {      
            fmt.Printf("%v: %v\n", match_aux[1], match_aux[2])
        }
        fmt.Println("--END OF MATCH--") 
    }  
}

See the Go demo

Output:

superman
shirt: blue
--END OF MATCH--
joker
--END OF MATCH--
spiderman
age: 15yo
girlFriend: Cindy
--END OF MATCH--

Here, ⛦(\w+)⛯((?:\w+☾\w+☽)*) is the main regex that matches and captures into Group 1 the main "key" and the string of the other key-values is captured into Group 2. Then, you need to iterate over the found matches, and collect all key-values from the Group 2 using (\w+)☾(\w+)☽.

Upvotes: 2

saddam
saddam

Reputation: 829

You have set your regex like ⛦(\\w+)⛯(?:(\\w+)☾(\\w+)☽)* which prints only two level of key and value, like it prints as per your regex:

[["⛦superman⛯shirt☾blue☽" "superman" "shirt" "blue"]
["⛦joker⛯" "joker" "" ""]
["⛦spiderman⛯age☾15yo☽girl☾Cindy☽" "spiderman" "girl" "Cindy"]]

I increase the regex one more key and value pairs and it prints the age value as well, follow the below code for regex:

re := regexp.MustCompile("⛦(\\w+)⛯(?:(\\w+)☾(\\w+)☽)*(?:(\\w+)☾(\\w+)☽)*")
    fmt.Printf("%q\n", re.FindAllStringSubmatch("⛦superman⛯shirt☾blue☽⛦joker⛯⛦spiderman⛯age☾15yo☽girl☾Cindy☽", -1))

Upvotes: 1

Related Questions