Reputation: 219
I am trying to parse a text file and get the variables from it. This is the code I use for converting the data to string.
File file = new File(p);
BfferedReader reader = new BufferedReader(new FileReader(file));
while ((line = reader.readLine()) != null) {
oldtext += line;
}
reader.close();
EDIT: The file has fixed length field name, length of the value, value.
For example, field name with length 10 followed by single digit length of the value and then the value
fieldOne 5abcdefieldTwo 3abcfieldThree6abcdef
Expected output is to store both field name and value as a key value pair
fieldOne : abcde
fieldTwo : abc
fieldThree : abcdef
Is there a way to write a regex pattern to split the string? I did search for this variable length split, but couldn't find any.
If the pattern split is not possible, I have to code to go through the loop checking field name, length of the value and split with index.
Upvotes: 2
Views: 124
Reputation: 785196
You can use this regex to capture field, length, value combination from input:
(\w[\w\s]{9})(\d)(.+?(?=\w[\w\s]{9}\d|$))
(\w[\w\s]{9})
- Matches a field name of exactly length of 10 (\d)
- Matches field length(.+?(?=\w[\w\s]{9}\d|$))
is a positive lookahead that asserts we have field:len
ahead or we have end of line.Code:
final String regex = "(\\w[\\w\\s]{9})(\\d)(.+?(?=\\w[\\w\\s]{9}\\d|$))";
final String string = "fieldOne 5abcdefieldTwo 3abcfieldThree6abcdef";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.printf("Field: <%s> Len: <%d> Value: <%s>%n",
matcher.group(1).trim(), matcher.group(2), matcher.group(3));
}
Upvotes: 1
Reputation: 10466
Now it is possible with your edited question.
Use this regex:
([^\d]{10})(\d)(.*?)
Try this:
final String pat = "([^\\d]{10})(\\d)(.*?)";
final String string = "fieldOne 5abcdefieldTwo 3abcfieldThree6abcdef";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(string);
String[] val = string.split(pat);
int cnt=0;
while(m.find())
System.out.println(m.group(1).trim()+" : "+val[++cnt]);
Sample output:
fieldOne : abcde
fieldTwo : abc
fieldThree : abcdef
Upvotes: 2
Reputation: 8781
There's no regular expression which will properly split this string for you. What you'd want is something like [a-zA-Z]+(?group1:[0-9]+)[a-zA-Z]{\group1}
in pseudo-re syntax. Unfortunately normal res don't offer this kind of behaviour, and the various extensions (PCRE, re2 etc.) don't either.
In fact, the language you're describing doesn't seem to be regular. If you'd try to build an automaton by hand for it, you'd find out you need some sort of memory when parsing the numbers part. My automata theory is rusty, but the thing might not even be context-free.
Also, check that you don't have ambiguities. Is something like position12ab
allowed to result in position1 : ab
or will it error out?
Upvotes: 1