kaalobaadar
kaalobaadar

Reputation: 429

Regular expression matching for location based info

I am working on a project in Java that gets location based information and provides users with relevant info. While working on this, I get info in a text file with the following format:

[loc.x.1234] has logged in. Connects to [loc.x.983]
[loc.x.3427] has left the room.

The info is always in square brackets ( [ ] ) and is of the format (string.string.string).

My objective is the extract the user information from these file feeds. My output should be of the following format:

loc.x.1234,loc.x.983
loc.x.3427

Although I've been programming for a some months, I am not familar with the use of regular expression. Any help with regard to this appreciated.

Upvotes: 0

Views: 85

Answers (2)

monk
monk

Reputation: 228

this may help you :

    String a="[loc.x.1234] has logged in. Connects to [loc.x.983]" +
            "[loc.x.3427] has left the room.";
    String regExp="(?<=\\[).*?(?=\\])";
    Pattern p = Pattern.compile(regExp);
    Matcher m = p.matcher(a);
    while(m.find()){
        System.out.println(m.group());
    }

the "(?<=\[)" and "(?=\])" discard the "[" and "]"

Upvotes: 2

Ethan Brown
Ethan Brown

Reputation: 27282

I'm sure a bunch of people are about to point out how StackOverflow is not a tutorial site, so be warned....

It would behoove you to learn about regular expressions. I recommend Michael Fitzgerald's excellent Introducing Regular Expressions.

Here's how you would tackle your problem with regular expressions:

\[(\w+)\.(\w+)\.(\w+)\]

Let's break it down. First, square brackets are metacharacters in regex, meaning they have special meaning. Since we want to match square brackets explicitly, we have to escape them (just like you have to escape quotation marks in a Java string). The escape character is the backslash, so the first thing this regex matches is a literal [ character.

Parentheses provide grouping, which generally serves two purposes. First, they group sub-expressions, allowing you to construct more complicated expressions. Secondly, they provide a way to "remember" what it was that was matched. In our case, we're using it to "remember" each of the three strings inside the square brackets.

Then, we use the metacharacter \w. This is regex shorthand for "letters, numbers, and the underscore", which is probably what you want in this case. There are other options if you don't. (For example, if you want to match whitespace as well, you would either do [\w\s], or you might just say [^.\] to match anything that's not a period or close square bracket.)

Then comes a + which is the regex metacharacter for "one or more". This means that at least one word character must be matched, and possibly more. If you want the possibility for empty strings, use the * metacharacter instead, which means "zero or more".

Between the sub-strings, we have to escape the period, because that has special meaning in regex.

Once you match on this regex, for each match, you'll get three groups, one for each of your three strings.

Good luck, and do try to read some tutorials before asking a question on StackOverflow.

Upvotes: 2

Related Questions