sam
sam

Reputation: 153

Java - Splitting text into array without obvious delimiter

I need to split each line of text into an array using a loop. The problem is that there's no obvious delimiter to use given the formatting of the text file (which I can't change):

Adam Rippon      New York, NY    77.58144.6163.6780.94
Brandon Mroz     Broadmoor, CO   70.57138.1266.8471.28
Stephen Carriere Boston, MA      64.42138.8368.2770.56
Grant Hochstein  New York, NY    64.62133.8867.4468.44
Keegan Messing   Alaska, AK      61.15136.3071.0266.28
Timothy Dolensky Atlanta, AL     61.76123.0861.3063.78
Max Aaron        Broadmoor, CO   86.95173.4979.4893.51
Jeremy Abbott    Detroit, MI     99.86174.4193.4280.99
Jason Brown      Skokie Value,IL 87.47182.6193.3489.27
Joshua Farris    Broadmoor, CO   78.37169.6987.1783.52
Richard Dornbush All Year, CA    92.04144.3465.8278.52
Douglas Razzano  Coyotes, AZ     75.18157.2580.6976.56
Ross Miner       Boston, MA      71.94152.8772.5380.34
Sean Rabbit      Glacier, CA     60.58122.7656.9066.86
Lukas Kaugars    Broadmoor, CO   64.57114.7550.4766.28
Philip Warren    All Year, CA    55.80113.2457.0258.22
Daniel Raad      Southwest FL    52.98108.0358.6151.42
Scott Dyer       Brooklyn, OH    55.78100.9744.3357.64
Robert PrzepioskiRochester, NY   47.00100.3449.2651.08

Ideally I would like each name to be in [0] (or first name in [0] last name in [1]), each location to be in [2] or also in two different indexes for city and state, and then each score to be in their own index. For each person there are four separate numbers. Like for example Adam Rippon's scores are 77.58, 144.61, 63.67, 80.94

I can't split by spaces because some of the cities have a space between their name (like New York would then be split into New and York in two different array elements while Broadmoor would be in one element). Can't split cities by commas because Southwest FL has no comma. I also can't split the numbers by decimal point because those numbers would be wrong. So is there an easy way to go about doing this? Like perhaps a way to split numbers by the amount of decimal places?

Upvotes: 3

Views: 2010

Answers (6)

kevcodez
kevcodez

Reputation: 1291

It looks like there is a fixed size for each column. So in your case, column 1 is 17 characters long, the second column is 16 characters long and the last one is 21 characters long.

Now you can simply iterate through the lines and make use of the substring() method. Something like...

String firstColumn = line.substring(0, 17).trim();
String secondColumn = line.substring(17, 33).trim();
String thirdColumn = line.substring(33, line.length).trim();

To extract the numbers, we could use a regular expression that searches for all numbers with two decimal places.

Pattern pattern = Pattern.compile("(\\d+\\.[0-9]{2})");

Matcher matcher = pattern.matcher(thirdColumn);

while(matcher.find())
{
    System.out.println(matcher.group());
}

So in this case 47.00100.3449.2651.08 will output

47.00
100.34
49.26
51.08

Upvotes: 7

Hoku
Hoku

Reputation: 81

Why don't you split by index ? The coordinates are the tricky one, but if you always have two numbers after the decimal points then this example can help.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;


public class Split {

    public static void main(String[] args) throws IOException {

        List<Person> lst = new ArrayList<Split.Person>();

        BufferedReader br = new BufferedReader(new FileReader("c:\\test\\file.txt"));

        try {
            String line = null;

            while ((line = br.readLine()) != null) {

                Person p = new Person();

                String[] name = line.substring(0,17).split(" ");
                String[] city = line.substring(17,33).split(" ");

                p.setName(name[0].trim());
                p.setLastname(name[1].trim());
                p.setCity(city[0].replace(",","").trim());
                p.setState(city[1].replace(",","").trim());

                String[] coordinates = new String[4];
                String coor = line.substring(33);

                String first = coor.substring(0, coor.indexOf(".") + 3);

                coor = coor.substring(first.length());

                String second = coor.substring(0, coor.indexOf(".") + 3);

                coor = coor.substring(second.length());

                String third = coor.substring(0, coor.indexOf(".") + 3);

                coor = coor.substring(third.length());

                String fourth = coor.substring(0, coor.indexOf(".") + 3);

                coordinates[0] = first;
                coordinates[1] = second;
                coordinates[2] = third;
                coordinates[3] = fourth;

                p.setCoordinates(coordinates);

                lst.add(p);
            }

        } finally {
            br.close();
        }

        for(Person p : lst){
            System.out.println(p.getName());
            System.out.println(p.getLastname());
            System.out.println(p.getCity());
            System.out.println(p.getState());
            for(String s : p.getCoordinates()){
                System.out.println(s);
            }

            System.out.println();
        }
    }

    public static class Person {

        public Person(){}

        private String name;
        private String lastname;
        private String city;
        private String state;
        private String[] coordinates;
        public String getName() {
            return name;
        }
        public void setName(String name) {
            this.name = name;
        }
        public String getLastname() {
            return lastname;
        }
        public void setLastname(String lastname) {
            this.lastname = lastname;
        }
        public String getCity() {
            return city;
        }
        public void setCity(String city) {
            this.city = city;
        }
        public String getState() {
            return state;
        }
        public void setState(String state) {
            this.state = state;
        }
        public String[] getCoordinates() {
            return coordinates;
        }
        public void setCoordinates(String[] coordinates) {
            this.coordinates = coordinates;
        }
    }

}

Upvotes: 0

Paul Vargas
Paul Vargas

Reputation: 42020

Read line by line, then in each line, substring by the corresponding limits. e.g.:

private static String[] split(String line) {
    return new String[] {
        line.substring(0, 16).trim(),
        line.substring(17, 32).trim(),
        line.substring(33, 37).trim(),
        line.substring(38, 43).trim(),
        line.substring(44, 48).trim(),
        line.substring(49, 53).trim(),
    };
}

Upvotes: 0

RealSkeptic
RealSkeptic

Reputation: 34628

This seems to be the good old fixed-position file format. It was highly popular in the days of punch card readers.

So basically, you read this file line by line, and then:

String name = line.substring(0,17).trim();
String location = line.substring(17,33).trim();

String[] scores = new String[4];
scores[0] = line.substring(33,38);
scores[1] = line.substring(38,44);
scores[2] = line.substring(44,49);
scores[3] = line.substring(49,54);

You can then go on and split the name by space, the location by ,, convert the scores into numbers and so on.

If you want to make all of the above more general, you can prepare a list of indexes, and create the array based on those indexes:

int[] fieldIndexes = { 0, 17,33,38,44,49,54 };
String values[] = new String[fieldIndexes.length - 1];

And then in your read loop (again I assume you read the line into line):

for ( int i = 1; i < fieldIndexes.length; i++ ) {

     values[i-1] = line.substring(fieldIndexes[i-1],fieldIndexes[i]).trim();

}

And then proceed to work with the values array.

Of course, make sure each line you read has the appropriate number of characters etc. so as to avoid out-of-bounds problems.

Upvotes: 0

John Kuhns
John Kuhns

Reputation: 506

Assuming the fields are fixed width, which is what it appears to be, you can do substring operations to get each field and then parse accordingly. Something like:

String name = line.substring(0,x)
String city_state = line.substring(x, y)
String num1 = line.substring(y,z)

Etc. where the x, y and z are the column breaks.

Upvotes: 0

havogt
havogt

Reputation: 2812

It looks like each column has a fixed size (number of characters). As you already said you cannot split by tabs or spaces because of the last line where there is no tab or space between name and city.

I propose to read one line and then split the String by line.substring(startIndex,endIndex). For example line.substring(0,18) for the name (if I counted correctly). Then you can split this name in first and lastname by using the space as delimiter.

Upvotes: 1

Related Questions