asdf
asdf

Reputation: 667

Java - processing a text file to identify each string delimited by a whitespace or a symbol

Title may not be clear, but what I am trying to do is:

For a sample line of a text file (that may be a program):

public static void main(String[] args){

I want to build an array of every string as well as each symbol used. In this line case I want:

ArrayList x = ["public", "static", "void", "main", "(", "String","[","]","args",")","{"]

My first thought was to:

  1. Split the string on whitespaces and all symbols to get all the strings with characters.
  2. Add these to arraylist
  3. Split the original line on characters or something to get the symbols
  4. Add to arraylist

Any ideas on the best way to do this? I can't really see a clear solution.

Upvotes: 0

Views: 82

Answers (2)

Michał Schielmann
Michał Schielmann

Reputation: 1382

Although not very elegant it does what you need:

public static void main(String[] args)
{
    String inputString = "public static void main(String[] args){";
    String charsToFind = "\\[\\]\\{\\}\\(\\)";
    String[] outputArray = (inputString.replaceAll("[^"+charsToFind+"]", "").replaceAll("(?!^)"," ") + inputString.replaceAll("[" + charsToFind + "]", " ")).replaceAll("\\s+", " ").split(" ");
    System.out.println(Arrays.toString(outputArray));
}

So here, first we have to define the chars that you want to use as delimiters (that is charsToFind variable). Then what the code does is:

  1. It replaces everything but the defined chars and inserts a space between every two chars.

    inputString.replaceAll("[^"+charsToFind+"]", "").replaceAll("(?!^)"," ")

  2. This way you have only your special characters separated by space.

  3. Next it substitutes all special characters for spaces and adds the result to the previous one

    + inputString.replaceAll("[" + charsToFind + "]", " ")

  4. And at the end it removes additional spaces and splits everything by space to an String[] array:

    .replaceAll("\\s+", " ").split(" ");

Output:

[(, [, ], ), {, public, static, void, main, String, args]

The answer by Fede is more elegant but after creating the array you need to remove empty entries. I hope this solution would help.

Upvotes: 0

Federico Piazza
Federico Piazza

Reputation: 31035

You can split your text using a regex like this:

(?=[\s\W])|(?<=[\s\W])

Working demo

You will have your text splitted as below:

enter image description here

Your code would be:

public void testSplit()
{
    String str = "public static void main(String[] args){";
    String[] arr = str.split("(?=[\\s\\W])|(?<=[\\s\\W])");
    System.out.println(Arrays.asList(arr));
}
// Print: 
// [public,  , static,  , void,  , main, (, String, [, ],  , args, ), {, ]

Then you can clean the empty string from your array.

Upvotes: 2

Related Questions