Gh0stwarr10r
Gh0stwarr10r

Reputation: 11

How to read formatted data from a text file in Java

So for the past week I've had this assignment and one of the things I have to do in this assignment is read formatted data from a text file. By formatted I mean something like this:

{
    Marsha      1234     Florida   1268
    Jane        1523     Texas     4456
    Mark        7253     Georgia   1234
}

(Note: this is just an example. Not actual data from my assignment.)

Now I've been trying to figure this out on my own. I've tried reading each line as a string and using the .substring() to get certain parts of said string and placing it into an array and then taking the index of that string from the array and printing it to the screen. Now I've tried a few different variations of this idea and it's just not working. It either ends up with an error or outputting the data in a weird way. Now the assignment is due tomorrow and I have no idea what to do. If anyone could please provide me with some help on this matter it would be very much appreciated.

Upvotes: 0

Views: 2709

Answers (4)

Hulk
Hulk

Reputation: 6583

For the example you have given, splitting the lines with the regex-pattern \s+ would work:

String s = "Marsha      1234     Florida   1268";
s.split("\\s+");

results in an array containing the 4 elements "Marsha", "1234", "Florida" and "1268".

The pattern I have used matches one or multiple whitespace characters - see The JavaDocs of Pattern for details and other options.


Another approach is to define the pattern your line needs to match as a whole, and capture the groups you are interested in:

String s = "Marsha      1234     Florida   1268";

Pattern pattern = Pattern.compile("(\\w+)\\s+(\\d+)\\s+(\\w+)\\s+(\\d+)");
Matcher matcher = pattern.matcher(s);

if (!matcher.matches())
    throw new IllegalArgumentException("line does not match the expected pattern"); //or do whatever else is appropriate for your use case

String name = matcher.group(1);
String id = matcher.group(2);
String state = matcher.group(3);
String whatever = matcher.group(4);

This pattern requires the second and fourth group to consist only of digits.

Note however that both of these approaches will break down if your data can contain spaces as well - in this case you need different patterns.

Upvotes: 2

dbl
dbl

Reputation: 1109

I really do believe that @JoniVR advice will be really helpful and you should consider using a separator for the columns per row. Currently you will not be able to parse composite data like the first name "Mary Ann". Also since the sample data you provided has already 4 rows you should have a POJO that will represent the data parsed form the file. A conceptual one looks like:

class MyPojo {

    private String name;
    private int postCode;
    private String state;
    private int cityId;

    public MyPojo(String name, int postCode, String state, int cityId) {
        this.name = name;
        this.postCode = postCode;
        this.state = state;
        this.cityId = cityId;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getPostCode() {
        return postCode;
    }

    public void setPostCode(int postCode) {
        this.postCode = postCode;
    }

    public String getState() {
        return state;
    }

    public void setState(String state) {
        this.state = state;
    }

    public int getCityId() {
        return cityId;
    }

    public void setCityId(int cityId) {
        this.cityId = cityId;
    }

    @Override
    public String toString() {
        return "MyPojo{" +
            "name='" + name + '\'' +
            ", postCode=" + postCode +
            ", state='" + state + '\'' +
            ", cityId=" + cityId +
            '}';
    }
}

Then you would like to have the errors met after validation of the rows i guess, so it's a good idea to think of some kind of Error class storing those(A properly designed one that extends Exception class maybe?). A very simple class for the purpose would be:

class InsertionError {
    private String message;
    private int lineNumber;

    public InsertionError(String message, int lineNumber) {
        this.message = message;
        this.lineNumber = lineNumber;
    }

    @Override
    public String toString() {
        return "Error at line " + lineNumber + " -> " + message;
    }
}

And then the solution itself should:
1. Split the lines.
2. Tokenize the columns per each row and parse/validate them.
3. Collect the columns data in a useful java representation.

Maybe something like:

private static final int HEADERS_COUNT = 4;
private static final int LINE_NUMBER_CURSOR = 0;

public static void main(String[] args) {
    String data =   "Marsha      1234     Florida   1268\n" +
                    "Jasmine     Texas    4456\n" +
                    "Jane        1523     Texas     4456\n" +
                    "Jasmine     Texas    2233      asd\n" +
                    "Mark        7253     Georgia   1234";

    int[] lineNumber = new int[1];

    List<InsertionError> errors = new ArrayList<>();

    List<MyPojo> insertedPojo = Arrays.stream(data.split("\n"))
        .map(x -> x.split("\\p{Blank}+"))
        .map(x -> {
            lineNumber[LINE_NUMBER_CURSOR]++;

            if (x.length == HEADERS_COUNT) {
                Integer postCode = null;
                Integer cityId = null;

                try {
                    postCode = Integer.valueOf(x[1]);
                } catch (NumberFormatException ignored) {
                    errors.add(new InsertionError("\"" + x[1] + "\" is not a numeric value.", lineNumber[LINE_NUMBER_CURSOR]));
                }

                try {
                    cityId = Integer.valueOf(x[3]);
                } catch (NumberFormatException ignored) {
                    errors.add(new InsertionError("\"" + x[3] + "\" is not a numeric value.", lineNumber[LINE_NUMBER_CURSOR]));
                }

                if (postCode != null && cityId != null) {
                    return new MyPojo(x[0], postCode, x[2], cityId);
                }
            } else {
                errors.add(new InsertionError("Columns count does not match headers count.", lineNumber[LINE_NUMBER_CURSOR]));
            }
            return null;
        })
        .filter(Objects::nonNull)
        .collect(Collectors.toList());

    errors.forEach(System.out::println);

    System.out.println("Number of successfully inserted Pojos is " + insertedPojo.size() + ". Respectively they are: ");

    insertedPojo.forEach(System.out::println);
}

, which prints:

Error at line 2 -> Columns count does not match headers count.
Error at line 4 -> "Texas" is not a numeric value.
Error at line 4 -> "asd" is not a numeric value.
Number of successfully inserted Pojos is 3. Respectively they are:
MyPojo{name='Marsha', postCode=1234, state='Florida', cityId=1268}
MyPojo{name='Jane', postCode=1523, state='Texas', cityId=4456}
MyPojo{name='Mark', postCode=7253, state='Georgia', cityId=1234}

Upvotes: 0

Tiago Luna
Tiago Luna

Reputation: 111

There are many different approaches you can use to read this formatted file. I would suggest that you first extract the relevant data from the text as a list of strings and then break the lines into fields. This is an example of how can you do this using the data sample you gave:

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class CustomTextReader {

    public static void main(String[] args) {
        String text =
                "Marsha      1234     Florida   1268\r\n" + 
                "Jane        1523     Texas     4456\r\n" + 
                "Mark        7253     Georgia   1234";

        //Extract the relevant data from the text as a list of arrays
        //  in which each array is a line, and each element is a field. 
        List<String[]> data = getData(text);
        //Just printing the results
        print(data);
    }

    private static List<String[]> getData(String text) {
        //1. Separate content into lines.
        return Arrays.stream(text.split("\r\n"))
                //2. Separate lines into fields.
                .map(s -> s.split("\\s{2,}"))
                .collect(Collectors.toList());
    }

    private static void print(List<String[]> data) {
        data.forEach(line -> {
            for(String field : line) {
                System.out.print(field + " | ");
            }
            System.out.println();
        });

    }
}

It's important to know what to expect from the data in terms of format. If you know that the fields don't contain whitespaces, you can use " " or \\s{2,} as the pattern for splitting the string in step 2. But if you think that data may contain fields with whitespaces (e.g. "North Carolina"), it's better to use another regex like \\s{2,} (that's what I did in the example above). I hope I helped you!

Upvotes: 0

Level_Up
Level_Up

Reputation: 824

First you must know the format of your file. Like your example if it start with { and end with }. What is the separator(s) of data? For example the separator can be semicolon, whitespace and so on. Knowing this you can start building the app. In your example I will write something like this:

public class MainClass
{

public static void main(String[] args)
{
    String s = "{\r\n"+
               "Marsha      1234     Florida   1268\r\n" + 
               "Jane        1523     Texas     4456\r\n" + 
               "Mark        7253     Georgia   1234\r\n"+
               "}\r\n";

    String[] rows = s.split("\r\n");

    //Here we will keep evertihing without the first and the last row
    List<String> importantRows = new ArrayList<>(rows.length-2);
    //lets assume that we do not need the first and the last row
    for(int i=0; i<rows.length; i++)
    {
        //String r = rows[i];
        //System.out.println(r);

        if(i>0 && i<rows.length)
        {
            importantRows.add(rows[i]);
        }

    }

    List<String> importantWords = new ArrayList<>(rows.length-2);
    //Now lets split every 'word' from row
    for(String rowImportantData : importantRows)
    {
        String[] oneRowData = rowImportantData.split(" ");

        //Here we will have one row like: [Marsha][ ][ ][ ][1234][ ][ ][ ][Florida][ ][ ][1268]
        // We need to remove the whitespace. This happen because there is more        
        //then one whitespace one after another. You can use some regex or another approach 
        // but I will show you this because you can have data that you do not need and you want to remove it.
        for(String data : oneRowData)
        {
            if(!data.trim().isEmpty())
            {
                importantWords.add(data);
            }
            //System.out.println(data);
        }

    }

    //Now we have the words.
    //You must know the rules that apply for this data. Let's assume from your example that you have (Name Number) group
    //If we want to print every group (Name Number) and we have in this state list with [Name][Number][Name][Number]....
    //Then we can print it this way
    for(int i=0; i<importantWords.size()-1; i=i+2)
    {
        System.out.println(importantWords.get(i) + " " + importantWords.get(i+1));
    }

}

}

This is only one example. You can make your app in many many different ways. The important part is you to know what is your initial state of the information that you want to handle and what is the result that you want to achieve.

Good luck!

Upvotes: 1

Related Questions