Reputation: 163

How to use regex with String.split()

I have the following String:

String fullPDFContex = "Title1 Title2\r\nTitle3 Title4\r\n\r\nTitle5 Title6\r\n \r\n Title7 \r\n\r\n\r\n\r\n\r\n"

I want to convert it to an array of String which will look like this.

String[] Title = {"Title1 Title2","Title3 Title4","Title5 Title6","Title7"}

I am trying the following code.

String[] Title=fullPDFContext.split("\r\n\r\n|\r\n \r\n|\r\n");

But not getting the desired output.

Upvotes: 2

Answers (3)

Wiktor Stribiżew

Reputation: 626893

You need to split with a pattern that matches any amount of whitespace that contains a line break:

String fullPDFContex = "Title1 Title2\r\nTitle3 Title4\r\n\r\nTitle5 Title6\r\n \r\n Title7 \r\n\r\n\r\n\r\n\r\n";
String separator = "\\p{javaWhitespace}*\\R\\p{javaWhitespace}*";
String results[] = fullPDFContex.split(separator);
System.out.println(Arrays.toString(results));
// => [Title1 Title2, Title3 Title4, Title5 Title6, Title7]

See the Java demo.

The \\p{javaWhitespace}*\\R\\p{javaWhitespace}* matches

\\p{javaWhitespace}* - 0+ whitespaces
\\R - a line break (you may replace it with [\r\n] for Java 7 and older)
\\p{javaWhitespace}* - 0+ whitespaces.

Alternatively, you may use a bit more efficient

String separator = "[\\s&&[^\r\n]]*\\R\\s*";

See another demo

Unfortunately, the \R construct cannot be used in the character classes. The pattern will match:

[\\s&&[^\r\n]]* - zero or more whitespace chars other than CR and LF (character class subtraction is used here)
\\R - a line break
\\s* - any 0+ whitespace chars.

Upvotes: 2

fazen

Reputation: 61

With this code you get the output you want:

String[] Title = fullPDFContext.split(" *(\r\n ?)+ *");

Upvotes: 0

sForSujit

Reputation: 985

Here is your solution. we can use StringTokenizer & I have used list to insert the splitted values.This can help you if you have n number of values splitted from your array

package com.sujit;

import java.util.ArrayList;
import java.util.List;
import java.util.StringTokenizer;

public class UserInput {

    public static void main(String[] args) {
        String fullPDFContex = "Title1 Title2\r\nTitle3 Title4\r\n\r\nTitle5 Title6\r\n \r\n Title7 \r\n\r\n\r\n\r\n\r\n";
        StringTokenizer token = new StringTokenizer(fullPDFContex, "\r\n");
        List<String> list = new ArrayList<>();
        while (token.hasMoreTokens()) {

            list.add(token.nextToken());
        }
        for (String string : list) {
            System.out.println(string);
        }
    }
}

Upvotes: 0

How to use regex with String.split()

Answers (3)

Related Questions