Reputation: 471
I am inexperienced with regex and rusty with JAVA, so some help here would be appreciated.
So I have a String in the form:
statement|digit|statement
statement|digit|statement
etc.
where statement can be any combination of characters, digits, and spaces. I want to parse this string such that I save the first and last statements of each line in a separate string array.
for example if I had a string:
cats|1|short hair and long hair
cats|2|black, blue
dogs|1|cats are better than dogs
I want to be able to parse the string into two arrays.
Array one = [cats], [cats], [dogs]
Array two = [short hair and long hair],[black, blue],[cats are better than dogs]
Matcher m = Pattern.compile("(\\.+)|\\d+|=(\\.+)").matcher(str);
while(m.find()) {
String key = m.group(1);
String value = m.group(2);
System.out.printf("key=%s, value=%s\n", key, value);
}
I would have continued to add the keys and values into seperate arrays had my output been right but no luck. Any help with this would be very much appreciated.
Upvotes: 1
Views: 120
Reputation: 8858
The main problem is that you need to escape |
and not the .
. Also what is the =
doing in your regex? I generalized the regex a little bit but you can replace .*
by \\d+
to have the same as you.
Matcher m = Pattern.compile("^(.+?)\\|.*\\|(.+)$", Pattern.MULTILINE).matcher(str);
Here is the strict version:"^([^|]+)\\|\\d+\\|([^|]+)$"
(also with MULTILINE)
And it's indeed easier using split
(on the lines) as some have said, but like this:
String[] parts = str.split("\\|\\d+\\|");
If parts.length
is not two then you know it is not a legal line.
If your input is always formatted like that, then you can just do with this single statement to get the left part in the even indexes and the right part in the odd indexes (0: line1-left, 1: line1-right, 2: line2-left, 3: line2-right, 4: line3-left ...), so you will get an array twice the size of line count.
String[] parts = str.split("\\|\\d+\\||\\n+");
Upvotes: 0
Reputation: 158
Here is a solution with RegEx:
public class ParseString {
public static void main(String[] args) {
String data = "cats|1|short hair and long hair\n"+
"cats|2|black, blue\n"+
"dogs|1|cats are better than dogs";
List<String> result1 = new ArrayList<>();
List<String> result2 = new ArrayList<>();
Pattern pattern = Pattern.compile("(.+)\\|\\d+\\|(.+)");
Matcher m = pattern.matcher(data);
while (m.find()) {
String key = m.group(1);
String value = m.group(2);
result1.add(key);
result2.add(value);
System.out.printf("key=%s, value=%s\n", key, value);
}
}
}
Here is a great site to help with regex http://txt2re.com/ expressions. Enter some example text in step one. Select the parts you are interested in part 2. And select a language in step 3. Then copy, paste and massage the code that it spits out.
Upvotes: 2
Reputation: 12346
I agree with the other answers that you should use split, but I am providing an answer that uses Pattern.split, since it uses a regex.
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Pattern;
/* Name of the class has to be "Main" only if the class is public. */
class MatchExample
{
public static void main (String[] args) {
String[] data = {
"cats|1|short hair and long hair",
"cats|2|black, blue",
"dogs|1|cats are better than dogs"
};
Pattern p = Pattern.compile("\\|\\d+\\|");
for(String line: data){
String[] elements = p.split(line);
System.out.println(elements[0] + " // " + elements[1]);
}
}
}
Notice that the pattern will match on one or more digits between two |'s. I see what you are doing with the groupings.
Upvotes: 0
Reputation: 1884
There is no need for a complex regex pattern, you could simple split the string by the demiliter token using the string's split method (String#split()) on Java.
public class StackOverFlow31840211 {
private static final int SENTENCE1_TOKEN_INDEX = 0;
private static final int DIGIT_TOKEN_INDEX = SENTENCE1_TOKEN_INDEX + 1;
private static final int SENTENCE2_TOKEN_INDEX = DIGIT_TOKEN_INDEX + 1;
public static void main(String[] args) {
String[] text = {
"cats|1|short hair and long hair",
"cats|2|black, blue",
"dogs|1|cats are better than dogs"
};
ArrayList<String> arrayOne = new ArrayList<String>();
ArrayList<String> arrayTwo = new ArrayList<String>();
for (String s : text) {
String[] tokens = s.split("\\|");
int tokenType = 0;
for (String token : tokens) {
switch (tokenType) {
case SENTENCE1_TOKEN_INDEX:
arrayOne.add(token);
break;
case SENTENCE2_TOKEN_INDEX:
arrayTwo.add(token);
break;
}
++tokenType;
}
}
System.out.println("Sentences for first token: " + arrayOne);
System.out.println("Sentences for third token: " + arrayTwo);
}
}
Upvotes: 0
Reputation: 2591
Double split should work:
class ParseString
{
public static void main(String[] args)
{
String s = "cats|1|short hair and long hair\ncats|2|black, blue\ndogs|1|cats are better than dogs";
String[] sa1 = s.split("\n");
for (int i = 0; i < sa1.length; i++)
{
String[] sa2 = sa1[i].split("\\|");
System.out.printf("key=%s, value=%s\n", sa2[0], sa2[2]);
} // end for i
} // end main
} // end class ParseString
Output:
key=cats, value=short hair and long hair
key=cats, value=black, blue
key=dogs, value=cats are better than dogs
Upvotes: 0