Eric Conner
Eric Conner

Reputation: 10772

Regex for matching alternating sequences

I'm working in Java and having trouble matching a repeated sequence. I'd like to match something like:

a.b.c.d.e.f.g.

and be able to extract the text between the delimiters (e.g. return abcdefg) where the delimiter can be multiple non-word characters and the text can be multiple word characters. Here is my regex so far:

([\\w]+([\\W]+)(?:[\\w]+\2)*)

(Doesn't work)

I had intended to get the delimiter in group 2 with this regex and then use a replaceAll on group 1 to exchange the delimiter for the empty string giving me the text only. I get the delimiter, but cannot get all the text.

Thanks for any help!

Upvotes: 3

Views: 1020

Answers (4)

Amarghosh
Amarghosh

Reputation: 59461

Replace (\w+)(\W+|$) with $1. Make sure that global flag is turned on.

It replaces a sequence of word chars followed by a sequence of non-word-chars or end-of-line with the sequence of words.

String line = "Am.$#%^ar.$#%^gho.$#%^sh";
line = line.replaceAll("(\\w+)(\\W+|$)", "$1");
System.out.println(line);//prints my name

Upvotes: 0

Rubens Farias
Rubens Farias

Reputation: 57976

Replace (\w+)\W+ by $1

Upvotes: 1

miku
miku

Reputation: 188144

Why not ..

  • find all occurences of (\w+) and then concatenate them; or
  • find all non word characters (\W+) and then use Matcher.html#replaceAll with an empty string?

Upvotes: 0

kennytm
kennytm

Reputation: 523624

Why not use String.split?

Upvotes: 0

Related Questions