Nissan911
Nissan911

Reputation: 293

regular expression help

i have spend like 2 hours on this, but i still can not find all result. the string is pretty simple. it is like

s:3\"content\";s:15\"another content\";

so the repeating pattern is like "s:(the length of string) \"(the string content)\"; " i am tring to get the content of the string.

i have tried "[s:(.*);]+" which i expected at least to get 3\"content\" and 15\"another content\", but i got totally a wrong result.

does anyone know how to get the string content from this pattern?

thanks soo much..

Upvotes: 1

Views: 96

Answers (2)

jcomeau_ictx
jcomeau_ictx

Reputation: 38432

This is Python, you'll need to do a little work to get it Java-friendly:

>>> import re
>>> s='s:3\"content\";s:15\"another content\";'
>>> re.compile('s:[0-9]+\\"([^"]+)\\";').findall(s)
['content', 'another content']

For the 2nd string, thank Filgera for the suggestion to use a non-greedy wildcard:

>>> s='s:11:\"strin1\";s:6:\"\\\"\\\"\\\"\";s:4:\"string2\";s:2:\"52\";s:4:\"string3\";s:16:\"​08\/23\/2011 00:00\";s:5:\"where\";s:9:\"\\\" \\\"\\\"\\\"\";'
>>> re.compile('s:[0-9]+:\\"(.*?)\\";').findall(s)
['strin1', '\\"\\"\\"', 'string2', '52', 'string3', '\xe2\x80\x8b08\\/23\\/2011 00:00', 'where', '\\" \\"\\"\\"']

Upvotes: 2

Tristian
Tristian

Reputation: 3512

How about s:\d+"([a-zA-Z0-9 ]*)"; it will match the following format s:3"My Content"; and capture the content string.



import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ContentExtractor {
    // For a long concatenated String
    private static String concatenatedPattern = "s:\\d+\"([a-zA-Z0-9 ]*)\";";
    private static String concatenatedString = "s:22\"This is a test string\";s:10\"Another Test\";";

    // When dealing with the tokens, usable after using the split method on the concatenated
    private static String tokensPattern = "^s:\\d+\"([a-zA-Z0-9 ]*)\"$";
    private static String[] tokens = new String[]{
            "s:3\"two\"",
            "s:10\"Token Content\""
    };

    public static void main(String args[]) {
        Matcher matcher;

        System.out.println("======= Long concatenated String======");
        matcher = Pattern.compile(concatPattern).matcher(concatenatedString);
        while( matcher.find() ) {
            System.out.println( matcher.group(1) );
        }


        System.out.println("==== single tokens ====");
        for (String token : tokens) {
            matcher = Pattern.compile(tokensPattern).matcher(token);
            if (matcher.find()) {
                System.out.println( matcher.group(1) );
            }
        }
    }
}

Upvotes: 0

Related Questions