evelyn
evelyn

Reputation: 89

Java string split on alphanumeric and new lines?

I have a test.txt file containing several lines for example, such as:

"h3llo, @my name is, bob! (how are you?)"

"i am fine@@@@@"

I want to split all the alphanumeric characters and the new line into an arraylist so the output would be

output = ["h", "llo", "my", "name", "is", "bob", "how", "are", "you", "i", "am", "fine"]

Right now, I tried splitting my text with

output.split("\\P{Alpha}+")

But for some reason this seems to add a comma in the first spot in the arraylist, and replaces the newline with an empty string

output = ["", "h", "llo", "my", "name", "is", "bob", "how", "are", "you", "", "i", "am", "fine"]

Is there another way to fix this? Thank you!

--

EDIT: How can I make sure it ignores the new line?

Upvotes: 6

Views: 254

Answers (3)

Andrew Mairose
Andrew Mairose

Reputation: 10995

Use your regex, put the result in an ArrayList (as that's what you want the data in at the end anyway), then just use removeIf to remove any empty strings.

String input = "\"h3llo, @my name is, bob! (how are you?)\"\n\n\"i am fine@@@@@\"";

ArrayList<String> arrayList = new ArrayList<>(Arrays.asList(input.split("\\P{Alpha}+")));
arrayList.removeIf(""::equals);

System.out.println(arrayList);

Result:

[h, llo, my, name, is, bob, how, are, you, i, am, fine]

Upvotes: 0

sashok_bg
sashok_bg

Reputation: 2971

Another solution is to use regex package in java.util.regex.*

It involves Matcher and Pattern.

    String input = "h3llo, @my name is, bob! (how are you?)\n"+
            "i am fine@@@@@";

    Pattern p = Pattern.compile("([a-zA-Z]+)");
    Matcher m = p.matcher(input);

    List<String> tokens = new ArrayList<String>();
    while (m.find()) {
        System.out.println("Found a " + m.group());
        tokens.add(m.group());
    }

P.S A good tool to test your regex pattern is https://regex101.com/

Upvotes: 0

dimo414
dimo414

Reputation: 48804

Java's String.split() behavior is pretty confusing. A much better splitting utility is Guava's Splitter. Their documentation goes into more detail about the problems with String.split():

The built in Java utilities for splitting strings can have some quirky behaviors. For example, String.split silently discards trailing separators, and StringTokenizer respects exactly five whitespace characters and nothing else.

Quiz: ",a,,b,".split(",") returns...

  1. "", "a", "", "b", ""
  2. null, "a", null, "b", null
  3. "a", null, "b"
  4. "a", "b"
  5. None of the above

The correct answer is none of the above: "", "a", "", "b". Only trailing empty strings are skipped. What is this I don't even.

In your case this should work:

Splitter.onPattern("\\P{Alpha}+").omitEmptyStrings().splitToList(output);

Upvotes: 2

Related Questions