user2665166
user2665166

Reputation: 471

How do I split/parse this String properly using Regex

I am inexperienced with regex and rusty with JAVA, so some help here would be appreciated.

So I have a String in the form:

statement|digit|statement

statement|digit|statement

etc.

where statement can be any combination of characters, digits, and spaces. I want to parse this string such that I save the first and last statements of each line in a separate string array.

for example if I had a string:

cats|1|short hair and long hair

cats|2|black, blue

dogs|1|cats are better than dogs

I want to be able to parse the string into two arrays.

Array one = [cats], [cats], [dogs]

Array two = [short hair and long hair],[black, blue],[cats are better than dogs]

    Matcher m = Pattern.compile("(\\.+)|\\d+|=(\\.+)").matcher(str);

        while(m.find()) {
          String key = m.group(1);
          String value = m.group(2);
          System.out.printf("key=%s, value=%s\n", key, value);
        }

I would have continued to add the keys and values into seperate arrays had my output been right but no luck. Any help with this would be very much appreciated.

Upvotes: 1

Views: 120

Answers (5)

maraca
maraca

Reputation: 8858

The main problem is that you need to escape | and not the .. Also what is the = doing in your regex? I generalized the regex a little bit but you can replace .* by \\d+ to have the same as you.

Matcher m = Pattern.compile("^(.+?)\\|.*\\|(.+)$", Pattern.MULTILINE).matcher(str);

Here is the strict version:"^([^|]+)\\|\\d+\\|([^|]+)$" (also with MULTILINE)

And it's indeed easier using split (on the lines) as some have said, but like this:

String[] parts = str.split("\\|\\d+\\|");

If parts.length is not two then you know it is not a legal line.

If your input is always formatted like that, then you can just do with this single statement to get the left part in the even indexes and the right part in the odd indexes (0: line1-left, 1: line1-right, 2: line2-left, 3: line2-right, 4: line3-left ...), so you will get an array twice the size of line count.

String[] parts = str.split("\\|\\d+\\||\\n+");

Upvotes: 0

Eric Wetmiller
Eric Wetmiller

Reputation: 158

Here is a solution with RegEx:

public class ParseString {
    public static void main(String[] args) {
        String data = "cats|1|short hair and long hair\n"+
                      "cats|2|black, blue\n"+
                      "dogs|1|cats are better than dogs";
        List<String> result1 = new ArrayList<>();
        List<String> result2 = new ArrayList<>();
        Pattern pattern = Pattern.compile("(.+)\\|\\d+\\|(.+)");

        Matcher m = pattern.matcher(data);
        while (m.find()) {
           String key = m.group(1);
           String value = m.group(2);
           result1.add(key);
           result2.add(value);
           System.out.printf("key=%s, value=%s\n", key, value);
        }
    }
}

Here is a great site to help with regex http://txt2re.com/ expressions. Enter some example text in step one. Select the parts you are interested in part 2. And select a language in step 3. Then copy, paste and massage the code that it spits out.

Upvotes: 2

matt
matt

Reputation: 12346

I agree with the other answers that you should use split, but I am providing an answer that uses Pattern.split, since it uses a regex.

import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Pattern;

/* Name of the class has to be "Main" only if the class is public. */
class MatchExample
{
    public static void main (String[] args) {
        String[] data = {
            "cats|1|short hair and long hair",
            "cats|2|black, blue",
            "dogs|1|cats are better than dogs"
        };
        Pattern p = Pattern.compile("\\|\\d+\\|");
        for(String line: data){

            String[] elements = p.split(line);
            System.out.println(elements[0] + " // " + elements[1]);

        }
    }
}

Notice that the pattern will match on one or more digits between two |'s. I see what you are doing with the groupings.

Upvotes: 0

Eder
Eder

Reputation: 1884

There is no need for a complex regex pattern, you could simple split the string by the demiliter token using the string's split method (String#split()) on Java.

Working Example

public class StackOverFlow31840211 {
    private static final int SENTENCE1_TOKEN_INDEX = 0;
    private static final int DIGIT_TOKEN_INDEX = SENTENCE1_TOKEN_INDEX + 1;
    private static final int SENTENCE2_TOKEN_INDEX = DIGIT_TOKEN_INDEX + 1;

    public static void main(String[] args) {
        String[] text = {
            "cats|1|short hair and long hair",
            "cats|2|black, blue",
            "dogs|1|cats are better than dogs"
        };

        ArrayList<String> arrayOne = new ArrayList<String>();
        ArrayList<String> arrayTwo = new ArrayList<String>();

        for (String s : text) {
            String[] tokens = s.split("\\|");

            int tokenType = 0;
            for (String token : tokens) {
                switch (tokenType) {
                    case SENTENCE1_TOKEN_INDEX:
                        arrayOne.add(token);
                        break;

                    case SENTENCE2_TOKEN_INDEX:
                        arrayTwo.add(token);
                        break;
                }

                ++tokenType;
            }
        }

        System.out.println("Sentences for first token: " + arrayOne);
        System.out.println("Sentences for third token: " + arrayTwo);

    }
}

Upvotes: 0

Andrey
Andrey

Reputation: 2591

Double split should work:

class ParseString
{  
  public static void main(String[] args)
  {  
    String s = "cats|1|short hair and long hair\ncats|2|black, blue\ndogs|1|cats are better than dogs";
    String[] sa1 = s.split("\n");
    for (int i = 0; i < sa1.length; i++)
    {  
      String[] sa2 = sa1[i].split("\\|");
      System.out.printf("key=%s, value=%s\n", sa2[0], sa2[2]);
    } // end for i
  } // end main
} // end class ParseString

Output:

key=cats, value=short hair and long hair
key=cats, value=black, blue
key=dogs, value=cats are better than dogs

Upvotes: 0

Related Questions