Reputation: 51
I am parsing many lines from a text file. The file lines are fixed length width but depending on beginning of the line ex "0301...." the file data structure is split. there are lines example beginning with 11, 34 etc, and based on that the line is split differently.
Example: if start of line contains "03", then the line would be split on
name = line.substring(2, 10);
surname = line.substring(11, 21);
id = line.substring(22, 34);
adress = line.substring (35, 46);
Another Example: if start of line contains "24", then the line would be split on
name = line.substring(5, 15);
salary = line.substring(35, 51);
empid = line.substring(22, 34);
department = line.substring (35, 46);
So I have many substrings are added to many strings, then written to a new file in csv.
My question would be is there any easy method for storing the coordinates (indexes) of a substring and calling them later easier? Example
name = (2,10);
surname = (11,21);
... etc.
Or probably any alternative of using substrings? thank you!
Upvotes: 0
Views: 938
Reputation: 2030
We can also use regex pattern and streams to achieve the results.
Say, we have a text file like this -
03SomeNameSomeSurname
24SomeName10000
The regex pattern has group name for assigning the attribute name to the parsed text. So, the pattern for the first line is -
^03(?<name>.{8})(?<surname>.{11})
The code is -
public static void main(String[] args) {
// Fixed Width File Lines
List<String> fileLines = List.of(
"03SomeNameSomeSurname",
"24SomeName10000"
);
// List all regex patterns for the specific file
List<Pattern> patternList = List.of(
Pattern.compile("^03(?<name>.{8})(?<surname>.{11})"), // Regex for String - 03SomeNameSomeSurname
Pattern.compile("^24(?<name>.{8})(?<salary>.{5})")); // Regex For String - 24SomeName10000
// Pattern for finding Group Names
Pattern groupNamePattern = Pattern.compile("\\?<([a-zA-Z0-9]*)>");
List<List<String>> output = fileLines.stream().map(
line -> patternList.stream() // Stream over the pattern list
.map(pattern -> pattern.matcher(line)) // Create a matcher for the fixed width line and regex pattern
.filter(matcher -> matcher.find()) // Filter matcher which matches correctly
.map( // Transform matcher results into String (Group Name = Matched Value
matcher ->
groupNamePattern.matcher(matcher.pattern().toString()).results() // Find Group Names for the regex pattern
.map(groupNameMatchResult -> groupNameMatchResult.group(1) + "=" + matcher.group(groupNameMatchResult.group(1))) // Transform into String (Group Name = Matched Value)
.collect(Collectors.joining(","))) // Join results delimited with ,
.collect(Collectors.toList())
).collect(Collectors.toList());
System.out.println(output);
}
The output result has parsed the attribute name and attribute value as a List of String.
[[name=SomeName,surname=SomeSurname], [name=SomeName,salary=10000]]
Upvotes: 0
Reputation: 384
You could try something like this. I'll leave the bounds checking and optimization to you, but as a first pass...
public static void main( String[] args ) {
Map<String, Map<String,IndexDesignation>> substringMapping = new HashMap<>();
// Put all the designations of how to map here
substringMapping.put( "03", new HashMap<>());
substringMapping.get( "03" ).put( "name", new IndexDesignation(2,10));
substringMapping.get( "03" ).put( "surname", new IndexDesignation(11,21));
// This determines which mapping value to use
Map<String,IndexDesignation> indexDesignationMap = substringMapping.get(args[0].substring(0,2));
// This holds the results
Map<String, String> resultsMap = new HashMap<>();
// Make sure we actually have a map to use
if ( indexDesignationMap != null ) {
// Now take this particular map designation and turn it into the resulting map of name to values
for ( Map.Entry<String,IndexDesignation> mapEntry : indexDesignationMap.entrySet() ) {
resultsMap.put(mapEntry.getKey(), args[0].substring(mapEntry.getValue().startIndex,
mapEntry.getValue().endIndex));
}
}
// Print out the results (and you can assign to another object here as needed)
System.out.println( resultsMap );
}
// Could also just use a list of two elements instead of this
static class IndexDesignation {
int startIndex;
int endIndex;
public IndexDesignation( int startIndex, int endIndex ) {
this.startIndex = startIndex;
this.endIndex = endIndex;
}
}
Upvotes: 1
Reputation: 2502
Create a class called Line
and store these objects rather than the string:
class Line {
int[] name;
int[] surname;
int[] id;
int[] address;
String line;
public Line(String line) {
this.line = line;
String startCode = line.substring(0, 3);
switch(startCode) {
case "03":
this.name = new int[]{2, 10};
this.surname = new int[]{11, 21};
this.id = new int[]{22, 34};
this.address = new int[]{35, 46};
break;
case "24":
// same thing with different indices
break;
// add more cases
}
}
public String getName() {
return this.line.substring(this.name[0], this.name[1]);
}
public String getSurname() {
return this.line.substring(this.surname[0], this.surname[1]);
}
public String getId() {
return this.line.substring(this.id[0], this.id[1]);
}
public String getAddress() {
return this.line.substring(this.address[0], this.address[1]);
}
}
Then:
String line = "03 ..."
Line parsed = new Line(line);
parsed.getName();
parsed.getSurname();
...
If you're going to retrieve the name
, surname
etc. multiple times from the Line
object, you can even cache it the first time so that you're not calling substring
multiple times
Upvotes: 1