Reputation: 1416
Does anyone know what am I doing wrong? I have this sentence:
hi [user=1234]John Jack[/user] take me home
and I need regex which select only John Jack
My regex:
(\[user=\d\d\d\d](.+?)\[\/user\])(?!(\[user=\d\d\d\d\])|(\[\/user\]))
I want exclude [user=1234]
and [/user]
This (\[user=\d\d\d\d](.+?)\[\/user\])
selects [user=1234]John Jack[/user]
but I want only John Jack
Full example:
hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\user]
Upvotes: 1
Views: 407
Reputation: 14853
Full decoding:
public class RegExpPattern_002 {
public static void main( String[] args ) {
final String text =
"hi [user=1234]John Jack[/user] take me home."
+ " [user=12] Jonno Ha [/user]"
+ " where you are [differentTag] hm? [/differentTag]."
+ " Peter Im here with [user=1]Danny Di [/user]";
final Pattern p = Pattern.compile(
"([^\\[]*)\\[(\\w+)(=([^\\]]+))?\\]([^\\[]*)\\[/(\\w+)\\]" );
final Matcher m = p.matcher( text );
while( m.find()) {
final String preText = m.group( 1 );
final String attrOpen = m.group( 2 );
final String value = m.group( 4 );
final String content = m.group( 5 );
final String attrClose = m.group( 6 );
assert attrClose.equals( attrOpen );
System.err.printf(
"pre = '%s', attr = '%s', value = '%s', content = '%s'\n",
preText, attrOpen, value, content );
System.err.println("-----------------------------");
}
}
}
Execution log:
pre = 'hi ', attr = 'user', value = '1234', content = 'John Jack'
-----------------------------
pre = ' take me home. ', attr = 'user', value = '12', content = ' Jonno Ha '
-----------------------------
pre = ' where you are ', attr = 'differentTag', value = 'null', content = ' hm? '
-----------------------------
pre = '. Peter Im here with ', attr = 'user', value = '1', content = 'Danny Di '
-----------------------------
Upvotes: 2
Reputation: 23784
I assume you no need any java code, otherwise please comment me to delete the answer
to exclude[user=1234]
and [/user]
you can use:
[^\]\[a-zA-Z=\d\/]
and for matching other parts:
[a-zA-Z ]*[^\]\[a-zA-Z=\d\/][a-zA-Z]*
and for input:
hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\user]
you can use:
[a-zA-Z ]*[^\]\[a-zA-Z=\d\/\\]+[a-zA-Z ]*
It matches all things except all inside []
[a-zA-Z ]*[^\]\[a-zA-Z=\d\/\\]+[a-zA-Z ]*
It excludes:
[user=1234]
[/user]
[user=12]
[\user]
[differentTag]
[/differentTag]
[user=1]
[\user]
and if you want to match only the user name before [/user]
or [\user]
you cat try:
[a-zA-Z ]+(?=\[(?:\\|\/)user\])
[a-zA-Z ]+(?=\[(?:\\|\/)user\])
it matches:
John Jack
Jonno Ha
Danny Di
and more efficient than above:
(?<=])[a-zA-Z ]+(?=\[(?:\\|\/))
and still matches:
John Jack
Jonno Ha
Danny Di
Upvotes: 0
Reputation: 124225
(.+?)
is group indexed as 2 and it should hold John Jack
so you should be able to obtain it via matcher.group(2)
.
Demo:
String text = "hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\\user]";
Pattern p = Pattern.compile("(\\[user=\\d\\d\\d\\d](.+?)\\[\\/user\\])(?!(\\[user=\\d\\d\\d\\d\\])|(\\[\\/user\\]))");
Matcher m = p.matcher(text);
if(m.find()){
System.out.println(m.group(2));
}
Output: John Jack
If you wanted to find more users you need to change if
to while
and fix your regex because
[user=12]
or [user=1]
. So instead of \d\d\d\d
you can use \d+
. [user=ID]..[/user]
but also [user=ID]..[\user]
(both /
and \
].BTW since Java doesn't use /regex/flags
syntax, /
is not considered as special character so you don't need to escape it.
Also I am not sure why you need (?!(\\[user=\\d\\d\\d\\d\\])|(\\[\\/user\\]))
at the end of your regex, it doesn't really do anything in example you showed so it looks like it can be removed. Also we don't need to surround earlier part with parenthesis because look-ahead doesn't add anything to whole match which is already placed in group 0, so we don't need separate group which will duplicate that match. After removing those those extra parenthesis (.+?)
will be indexed as group 1.
Modified and simplified solution can look like:
String text = "hi [user=1234]John Jack[/user] take me home. [user=12] Jonno Ha [\\user] where you are [differentTag] hm? [/differentTag]. Peter Im here with [user=1]Danny Di [\\user]";
Pattern p = Pattern.compile("\\[user=\\d+](.+?)\\[(/|\\\\)user]");
Matcher m = p.matcher(text);
while(m.find()){
System.out.println(m.group(1).trim());
}
Output:
John Jack
Jonno Ha
Danny Di
Upvotes: 3
Reputation: 1126
Alternative to the answer from @matoni with "lookahead" and "lookbehind" syntax, you can use grouping (which are already defined in your pattern) and extract an appropriate group:
String s = "hi [user=1234]John Jack[/user] take me home ...";
Pattern p = Pattern.compile("\\[user=\\d+\\](.+)\\[/user\\]");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
Upvotes: 4
Reputation:
Try this:
String s = "hi [user=1234]John Jack[/user] take me home";
// assuming user id has always 4 decimals
Pattern p = Pattern.compile("(?<=\\[user=\\d{4}\\]).+(?=\\[/user\\])");
Matcher m = p.matcher(s);
m.find();
System.out.println(s.substring(m.start(), m.end()));
Note, you can not use in "lookbehind" pattern of variable length like (?<=.+)
. So if you know, that user id has at max e.g. 11 places, then you can use:
Pattern.compile("(?<=\\[user=\\d{4,11}\\]).+(?=\\[/user\\])");
For more details about regex see: Pattern javadoc
Upvotes: 2