Dinesh
Dinesh

Reputation: 219

Split a string based on the length of individual fields

I am trying to parse a text file and get the variables from it. This is the code I use for converting the data to string.

File file = new File(p);
BfferedReader reader = new BufferedReader(new FileReader(file));

while ((line = reader.readLine()) != null) {
    oldtext += line;
}
reader.close();

EDIT: The file has fixed length field name, length of the value, value.

For example, field name with length 10 followed by single digit length of the value and then the value

fieldOne  5abcdefieldTwo  3abcfieldThree6abcdef

Expected output is to store both field name and value as a key value pair

fieldOne : abcde
fieldTwo : abc
fieldThree : abcdef

Is there a way to write a regex pattern to split the string? I did search for this variable length split, but couldn't find any.

If the pattern split is not possible, I have to code to go through the loop checking field name, length of the value and split with index.

Upvotes: 2

Views: 124

Answers (3)

anubhava
anubhava

Reputation: 785196

You can use this regex to capture field, length, value combination from input:

(\w[\w\s]{9})(\d)(.+?(?=\w[\w\s]{9}\d|$))
  • (\w[\w\s]{9}) - Matches a field name of exactly length of 10
  • (\d) - Matches field length
  • (.+?(?=\w[\w\s]{9}\d|$)) is a positive lookahead that asserts we have field:len ahead or we have end of line.

RegEx Demo

Code:

final String regex = "(\\w[\\w\\s]{9})(\\d)(.+?(?=\\w[\\w\\s]{9}\\d|$))";
final String string = "fieldOne  5abcdefieldTwo  3abcfieldThree6abcdef";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.printf("Field: <%s> Len: <%d> Value: <%s>%n",
           matcher.group(1).trim(), matcher.group(2), matcher.group(3));
}

Upvotes: 1

Mustofa Rizwan
Mustofa Rizwan

Reputation: 10466

Now it is possible with your edited question.

Use this regex:

([^\d]{10})(\d)(.*?)

Explanation

Try this:

final String pat = "([^\\d]{10})(\\d)(.*?)";
final String string = "fieldOne  5abcdefieldTwo  3abcfieldThree6abcdef";

Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(string);        
String[] val = string.split(pat);

int cnt=0;
while(m.find())
    System.out.println(m.group(1).trim()+" : "+val[++cnt]);

Run it

Sample output:

fieldOne : abcde
fieldTwo : abc
fieldThree : abcdef

Upvotes: 2

Horia Coman
Horia Coman

Reputation: 8781

There's no regular expression which will properly split this string for you. What you'd want is something like [a-zA-Z]+(?group1:[0-9]+)[a-zA-Z]{\group1} in pseudo-re syntax. Unfortunately normal res don't offer this kind of behaviour, and the various extensions (PCRE, re2 etc.) don't either.

In fact, the language you're describing doesn't seem to be regular. If you'd try to build an automaton by hand for it, you'd find out you need some sort of memory when parsing the numbers part. My automata theory is rusty, but the thing might not even be context-free.


Also, check that you don't have ambiguities. Is something like position12ab allowed to result in position1 : ab or will it error out?

Upvotes: 1

Related Questions