Reputation: 153
I need to split each line of text into an array using a loop. The problem is that there's no obvious delimiter to use given the formatting of the text file (which I can't change):
Adam Rippon New York, NY 77.58144.6163.6780.94
Brandon Mroz Broadmoor, CO 70.57138.1266.8471.28
Stephen Carriere Boston, MA 64.42138.8368.2770.56
Grant Hochstein New York, NY 64.62133.8867.4468.44
Keegan Messing Alaska, AK 61.15136.3071.0266.28
Timothy Dolensky Atlanta, AL 61.76123.0861.3063.78
Max Aaron Broadmoor, CO 86.95173.4979.4893.51
Jeremy Abbott Detroit, MI 99.86174.4193.4280.99
Jason Brown Skokie Value,IL 87.47182.6193.3489.27
Joshua Farris Broadmoor, CO 78.37169.6987.1783.52
Richard Dornbush All Year, CA 92.04144.3465.8278.52
Douglas Razzano Coyotes, AZ 75.18157.2580.6976.56
Ross Miner Boston, MA 71.94152.8772.5380.34
Sean Rabbit Glacier, CA 60.58122.7656.9066.86
Lukas Kaugars Broadmoor, CO 64.57114.7550.4766.28
Philip Warren All Year, CA 55.80113.2457.0258.22
Daniel Raad Southwest FL 52.98108.0358.6151.42
Scott Dyer Brooklyn, OH 55.78100.9744.3357.64
Robert PrzepioskiRochester, NY 47.00100.3449.2651.08
Ideally I would like each name to be in [0] (or first name in [0] last name in [1]), each location to be in [2] or also in two different indexes for city and state, and then each score to be in their own index. For each person there are four separate numbers. Like for example Adam Rippon's scores are 77.58, 144.61, 63.67, 80.94
I can't split by spaces because some of the cities have a space between their name (like New York would then be split into New and York in two different array elements while Broadmoor would be in one element). Can't split cities by commas because Southwest FL has no comma. I also can't split the numbers by decimal point because those numbers would be wrong. So is there an easy way to go about doing this? Like perhaps a way to split numbers by the amount of decimal places?
Upvotes: 3
Views: 2010
Reputation: 1291
It looks like there is a fixed size for each column. So in your case, column 1 is 17 characters long, the second column is 16 characters long and the last one is 21 characters long.
Now you can simply iterate through the lines and make use of the substring()
method. Something like...
String firstColumn = line.substring(0, 17).trim();
String secondColumn = line.substring(17, 33).trim();
String thirdColumn = line.substring(33, line.length).trim();
To extract the numbers, we could use a regular expression that searches for all numbers with two decimal places.
Pattern pattern = Pattern.compile("(\\d+\\.[0-9]{2})");
Matcher matcher = pattern.matcher(thirdColumn);
while(matcher.find())
{
System.out.println(matcher.group());
}
So in this case 47.00100.3449.2651.08
will output
47.00
100.34
49.26
51.08
Upvotes: 7
Reputation: 81
Why don't you split by index ? The coordinates are the tricky one, but if you always have two numbers after the decimal points then this example can help.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class Split {
public static void main(String[] args) throws IOException {
List<Person> lst = new ArrayList<Split.Person>();
BufferedReader br = new BufferedReader(new FileReader("c:\\test\\file.txt"));
try {
String line = null;
while ((line = br.readLine()) != null) {
Person p = new Person();
String[] name = line.substring(0,17).split(" ");
String[] city = line.substring(17,33).split(" ");
p.setName(name[0].trim());
p.setLastname(name[1].trim());
p.setCity(city[0].replace(",","").trim());
p.setState(city[1].replace(",","").trim());
String[] coordinates = new String[4];
String coor = line.substring(33);
String first = coor.substring(0, coor.indexOf(".") + 3);
coor = coor.substring(first.length());
String second = coor.substring(0, coor.indexOf(".") + 3);
coor = coor.substring(second.length());
String third = coor.substring(0, coor.indexOf(".") + 3);
coor = coor.substring(third.length());
String fourth = coor.substring(0, coor.indexOf(".") + 3);
coordinates[0] = first;
coordinates[1] = second;
coordinates[2] = third;
coordinates[3] = fourth;
p.setCoordinates(coordinates);
lst.add(p);
}
} finally {
br.close();
}
for(Person p : lst){
System.out.println(p.getName());
System.out.println(p.getLastname());
System.out.println(p.getCity());
System.out.println(p.getState());
for(String s : p.getCoordinates()){
System.out.println(s);
}
System.out.println();
}
}
public static class Person {
public Person(){}
private String name;
private String lastname;
private String city;
private String state;
private String[] coordinates;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getLastname() {
return lastname;
}
public void setLastname(String lastname) {
this.lastname = lastname;
}
public String getCity() {
return city;
}
public void setCity(String city) {
this.city = city;
}
public String getState() {
return state;
}
public void setState(String state) {
this.state = state;
}
public String[] getCoordinates() {
return coordinates;
}
public void setCoordinates(String[] coordinates) {
this.coordinates = coordinates;
}
}
}
Upvotes: 0
Reputation: 42020
Read line by line, then in each line, substring by the corresponding limits. e.g.:
private static String[] split(String line) {
return new String[] {
line.substring(0, 16).trim(),
line.substring(17, 32).trim(),
line.substring(33, 37).trim(),
line.substring(38, 43).trim(),
line.substring(44, 48).trim(),
line.substring(49, 53).trim(),
};
}
Upvotes: 0
Reputation: 34628
This seems to be the good old fixed-position file format. It was highly popular in the days of punch card readers.
So basically, you read this file line by line, and then:
String name = line.substring(0,17).trim();
String location = line.substring(17,33).trim();
String[] scores = new String[4];
scores[0] = line.substring(33,38);
scores[1] = line.substring(38,44);
scores[2] = line.substring(44,49);
scores[3] = line.substring(49,54);
You can then go on and split the name by space, the location by ,
, convert the scores into numbers and so on.
If you want to make all of the above more general, you can prepare a list of indexes, and create the array based on those indexes:
int[] fieldIndexes = { 0, 17,33,38,44,49,54 };
String values[] = new String[fieldIndexes.length - 1];
And then in your read loop (again I assume you read the line into line
):
for ( int i = 1; i < fieldIndexes.length; i++ ) {
values[i-1] = line.substring(fieldIndexes[i-1],fieldIndexes[i]).trim();
}
And then proceed to work with the values
array.
Of course, make sure each line you read has the appropriate number of characters etc. so as to avoid out-of-bounds problems.
Upvotes: 0
Reputation: 506
Assuming the fields are fixed width, which is what it appears to be, you can do substring operations to get each field and then parse accordingly. Something like:
String name = line.substring(0,x)
String city_state = line.substring(x, y)
String num1 = line.substring(y,z)
Etc. where the x, y and z are the column breaks.
Upvotes: 0
Reputation: 2812
It looks like each column has a fixed size (number of characters). As you already said you cannot split by tabs or spaces because of the last line where there is no tab or space between name and city.
I propose to read one line and then split the String by line.substring(startIndex,endIndex)
. For example line.substring(0,18)
for the name (if I counted correctly). Then you can split this name in first and lastname by using the space as delimiter.
Upvotes: 1