pavol.franek
pavol.franek

Reputation: 1416

Regex exclude word from matches

Does anyone know what am I doing wrong? I have this sentence:

hi [user=1234]John Jack[/user] take me home

and I need regex which select only John Jack

My regex:

(\[user=\d\d\d\d](.+?)\[\/user\])(?!(\[user=\d\d\d\d\])|(\[\/user\]))

I want exclude [user=1234] and [/user]

This (\[user=\d\d\d\d](.+?)\[\/user\]) selects [user=1234]John Jack[/user] but I want only John Jack

Full example:

hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\user]

Upvotes: 1

Views: 407

Answers (5)

Aubin
Aubin

Reputation: 14853

Full decoding:

public class RegExpPattern_002 {

   public static void main( String[] args ) {
      final String text =
         "hi [user=1234]John Jack[/user] take me home."
         + " [user=12] Jonno Ha [/user]"
         + " where you are [differentTag] hm? [/differentTag]."
         + " Peter Im here with [user=1]Danny Di [/user]";
      final Pattern p = Pattern.compile(
         "([^\\[]*)\\[(\\w+)(=([^\\]]+))?\\]([^\\[]*)\\[/(\\w+)\\]" );
      final Matcher m = p.matcher( text );
      while( m.find()) {
         final String preText   = m.group( 1 );
         final String attrOpen  = m.group( 2 );
         final String value     = m.group( 4 );
         final String content   = m.group( 5 );
         final String attrClose = m.group( 6 );
         assert attrClose.equals( attrOpen );
         System.err.printf(
            "pre = '%s', attr = '%s', value = '%s', content = '%s'\n",
            preText, attrOpen, value, content );
         System.err.println("-----------------------------");
      }
   }
}

Execution log:

pre = 'hi ', attr = 'user', value = '1234', content = 'John Jack'
-----------------------------
pre = ' take me home. ', attr = 'user', value = '12', content = ' Jonno Ha '
-----------------------------
pre = ' where you are ', attr = 'differentTag', value = 'null', content = ' hm? '
-----------------------------
pre = '. Peter Im here with ', attr = 'user', value = '1', content = 'Danny Di '
-----------------------------

Upvotes: 2

Shakiba Moshiri
Shakiba Moshiri

Reputation: 23784

I assume you no need any code, otherwise please comment me to delete the answer

to exclude[user=1234] and [/user] you can use:

[^\]\[a-zA-Z=\d\/]

and for matching other parts:

[a-zA-Z ]*[^\]\[a-zA-Z=\d\/][a-zA-Z]*

and for input:

hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\user]

you can use:

[a-zA-Z ]*[^\]\[a-zA-Z=\d\/\\]+[a-zA-Z ]* 

It matches all things except all inside []

[a-zA-Z ]*[^\]\[a-zA-Z=\d\/\\]+[a-zA-Z ]*

It excludes:

[user=1234]
[/user]
[user=12]
[\user]
[differentTag]
[/differentTag]
[user=1]
[\user]

and if you want to match only the user name before [/user] or [\user] you cat try:

[a-zA-Z ]+(?=\[(?:\\|\/)user\]) 

[a-zA-Z ]+(?=\[(?:\\|\/)user\])

it matches:

John Jack  
Jonno Ha 
Danny Di 

and more efficient than above:

(?<=])[a-zA-Z ]+(?=\[(?:\\|\/))  

and still matches:

John Jack  
Jonno Ha 
Danny Di 

Upvotes: 0

Pshemo
Pshemo

Reputation: 124225

(.+?) is group indexed as 2 and it should hold John Jack so you should be able to obtain it via matcher.group(2).

Demo:

String text = "hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\\user]";
Pattern p = Pattern.compile("(\\[user=\\d\\d\\d\\d](.+?)\\[\\/user\\])(?!(\\[user=\\d\\d\\d\\d\\])|(\\[\\/user\\]))");
Matcher m = p.matcher(text);
if(m.find()){
    System.out.println(m.group(2));
}

Output: John Jack


If you wanted to find more users you need to change if to while and fix your regex because

  • currently you are searching for users with 4 digit IDs so it will fail to match [user=12] or [user=1]. So instead of \d\d\d\d you can use \d+.
  • you are using [user=ID]..[/user] but also [user=ID]..[\user] (both / and \].

BTW since Java doesn't use /regex/flags syntax, / is not considered as special character so you don't need to escape it.

Also I am not sure why you need (?!(\\[user=\\d\\d\\d\\d\\])|(\\[\\/user\\])) at the end of your regex, it doesn't really do anything in example you showed so it looks like it can be removed. Also we don't need to surround earlier part with parenthesis because look-ahead doesn't add anything to whole match which is already placed in group 0, so we don't need separate group which will duplicate that match. After removing those those extra parenthesis (.+?) will be indexed as group 1.

Modified and simplified solution can look like:

String text = "hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\\user]";
Pattern p = Pattern.compile("\\[user=\\d+](.+?)\\[(/|\\\\)user]");
Matcher m = p.matcher(text);
while(m.find()){
    System.out.println(m.group(1).trim()); 
}

Output:

John Jack
Jonno Ha
Danny Di

Upvotes: 3

Vladimir L.
Vladimir L.

Reputation: 1126

Alternative to the answer from @matoni with "lookahead" and "lookbehind" syntax, you can use grouping (which are already defined in your pattern) and extract an appropriate group:

    String s = "hi [user=1234]John Jack[/user] take me home ...";
    Pattern p = Pattern.compile("\\[user=\\d+\\](.+)\\[/user\\]");
    Matcher m = p.matcher(s);
    while (m.find()) {
        System.out.println(m.group(1));
    }

Upvotes: 4

user4413257
user4413257

Reputation:

Try this:

String s = "hi [user=1234]John Jack[/user] take me home";
// assuming user id has always 4 decimals
Pattern p = Pattern.compile("(?<=\\[user=\\d{4}\\]).+(?=\\[/user\\])");
Matcher m = p.matcher(s);
m.find();
System.out.println(s.substring(m.start(), m.end()));

Note, you can not use in "lookbehind" pattern of variable length like (?<=.+). So if you know, that user id has at max e.g. 11 places, then you can use:

Pattern.compile("(?<=\\[user=\\d{4,11}\\]).+(?=\\[/user\\])");

For more details about regex see: Pattern javadoc

Upvotes: 2

Related Questions