Shaheryar Rajper
Shaheryar Rajper

Reputation: 69

string split pattern java

I am trying to do a split of String array at the i th location. with a regex for 4 or more spaces.

i found a lot of information here and other sites, hence I came up with

String[] parts = titlesAuthor[i].split("    ");

so the split can happen between the title and authors name which contains either 4 or more spaces or does not exist as all.

Example:

titleAuthor[0] = Investigational drugs for autonomic dysfunction in Parkinson's disease          Perez-Lloret S

After running the above split, parts[0] is coming up as empty and part[1] has the complete string.

please help!

code :

for (int i = 0; i < nodes.getLength(); i++) { Element element = (Element) nodes.item(i); NodeList title = element.getElementsByTagName("TEXT"); line = (Element) title.item(0); titlesAuthor[i] = getCharacterDataFromElement(line); System.out.println(titlesAuthor[i]); parts = titlesAuthor[i].split(" "); System.out.println(parts[0]); System.out.println(parts[1]); } 

Upvotes: 0

Views: 178

Answers (4)

Gavriel
Gavriel

Reputation: 19237

To catch 4 or more spaces you need to indicate it with a +:

String[] parts = titlesAuthor[i].split("    +");

or:

String[] parts = titlesAuthor[i].split(" {4,}");

update: it looks like your xml doesn't look exactly as you think. In the code you provided add:

System.out.println(i + ":" + titlesAuthor[i] + ";");

and you'll see some spaces or new lines at the beginnng.

Upvotes: 0

Floam
Floam

Reputation: 704

In your example, your code is splitting when it finds four consecutive spaces. The String that you are splitting above has ten consecutive spaces between:

"disease          Perez".

Thus, there is a split between the spaces. Pretend "#" is a space:

Investigational drugs for autonomic dysfunction in Parkinson's disease|SPLIT|null|SPLIT|##Perez-Lloret S

Your split will result in:

{[Investigational drugs for autonomic dysfunction in Parkinson's disease],[null], [##Perez-Lloret S]}

because your code found two instances of four spaces. The parts[1] is empty because there was nothing present in between the two splits.

Hope this helps!

Upvotes: 0

Gary Y Kim
Gary Y Kim

Reputation: 79

THIS will skip the space.. split ("\s+")

Upvotes: 0

SomeDude
SomeDude

Reputation: 14238

Use regex \s{4}

Actually 4 is the number of spaces , you can change it to whatever number you want.

See the demo

Upvotes: 1

Related Questions