Orby
Orby

Reputation: 418

Regex To Parse Strings To Map

I have the following String that I want to parse into a Map of Maps that could contain both String and Integer values. The String contains a key that could be made up of letters, spaces and/or special character’s and values (another map) inside some curly braces “{“ and “}”:

    {A->B={A=0, C=2, B=3, D=“A”, M=0, H=7, key=A->B},
     B->C={A=0, C=2, B=3, D=“A”, M=0, H=7, key=B->C}, 
     D & E={A=0, C=2, B=4, D=“A”, M=0, H=7, key=D & E},
     FGH={A=0, C=2, B=3, D=“A”, M=0, H=7, key=FGH}}

I need a regex that will identify the key/values inside the curly braces so I can parse these into a map before storing the map in another outer map using a key from the inner map.

Here’s my code so far:

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StringToMapOfMapsDemo {

    public static void main(String[] args) {

        String str = "{A->B={A=0, C=2, B=3, D=“A”, M=0, H=7, key=A->B},
                              B->C={A=0, C=2, B=3, D=“A”, M=0, H=7, key=B->C}, 
                              D & E={A=0, C=2, B=4, D=“A”, M=0, H=7, key=D & E},
                              FGH={A=0, C=2, B=3, D=“A”, M=0, H=7, key=FGH}}";
        List<String> stringList = Stream.of(str.split("\\s*[{},]\\s*")).map(String::trim).collect(Collectors.toList());
        System.out.println(stringList);
        Map<String, Object> outerMap = new HashMap<>();
        for (String keyValue : stringList) {
            System.out.println(keyValue);
            Map<String, String> innerMap = new HashMap<>();
            String[] keyValueParts = keyValue.split("=");
            System.out.println(keyValueParts);
            innerMap.put(keyValueParts[0], keyValueParts[1]);
            if (innerMap.containsKey("key")){
                String keyForOuterMap = innerMap.get("key");
                outerMap.put(keyForOuterMap, innerMap);
            }
        }
        System.out.println(outerMap);
    }
}

Upvotes: 0

Views: 653

Answers (1)

Bohemian
Bohemian

Reputation: 425033

Can’t help you, because you first split(), which consumes the separators, which are needed via lookarounds to match keys using this regex:

[^{}=,\s][^{}=,]+(?==\{)

See live demo.

Rather than reinvent the wheel, I would first convert the input to JSON by adding " in the appropriate places, then parse it to a Map<String, Object> using whatever library you want, which can all handle nested maps and keys/values with quotes in them (which you would need to escape).

Here's some code to do that (tested and works with sample input provided in question):

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.SerializationFeature;

str = str.trim().replace("\"", "\\\""); // trim, escape quotes
str = str.replaceAll("([{,])\\s*(.*?)=", "$1\"$2\"="); // quote names
str = str.replaceAll("=\\s*(?!\\{)(.*?)([},])", "=\"$1\"$2"); // quote values
str = str.replace('=', ':'); // replace = with :
// str is now valid json

// parse, chosing the jackson library
ObjectMapper mapper = new ObjectMapper().enable(SerializationFeature.INDENT_OUTPUT); // with pretty option
// parse, deserialize to LinkedHashMap to preserve order
Map<String, Object> map = (HashMap<String, Object>) mapper.readValue(str, LinkedHashMap.class);
// print parsed map as correctly indented json (see INDENT_OUTPUT enabled above)
System.out.println(mapper.writeValueAsString(map));

If you insist on writing your own code, you’re going to need to write a proper parser that understands the grammar of your language. Regex may help with that, but you’ll also need complex logic that tokenizes the input into an AST. Instead, I would do it the easy way, as per above.

Upvotes: 2

Related Questions