sasklacz
sasklacz

Reputation: 3628

improving data extraction from text file in Java

I have CSV file with sample data in this form :

220 30    255   0   0     Javascript
200 20      0 255 128     Thinking in java

, where the first column is height, second thickness, next three are rgb values for color and last one is title. All need to be treated as separate variables. I have already written my own solution for this, but I'm wondering if there are no better/easier/shorter ways of doing this. Extracted data will then be used to create Book object, throw every Book into array of books and print it with swing. Here's the code :

private static Book[] addBook(Book b, Book[] bookTab){
        Book[] tmp = bookTab;
        bookTab = new Book[tmp.length+1];
        for(int i = 0; i < tmp.length; i++){
                bookTab[i] = tmp[i];
        }
        bookTab[tmp.length] = b;

        return bookTab;
}

public static void main(String[] args) {

    Book[] books = new Book[0];

    try {
        BufferedReader file = new BufferedReader(new FileReader("K:\\books.txt"));

        String s;
        while ((s = file.readLine()) != null) {
            int hei, thick, R, G, B;
            String tit;

            hei = Integer.parseInt(s.substring(0, 3).replaceAll(" ", ""));
            thick = Integer.parseInt(s.substring(4, 6).replaceAll(" ", ""));
            R = Integer.parseInt(s.substring(10, 13).replaceAll(" ", ""));
            G = Integer.parseInt(s.substring(14, 17).replaceAll(" ", ""));
            B = Integer.parseInt(s.substring(18, 21).replaceAll(" ", ""));

            tit = s.substring(26);

            System.out.println(tyt+wys+grb+R+G+B);

            books = addBook(new Book(wys, grb, R, G, B, tyt),books);
        }
        file.close();
    } catch (IOException e) {
        //do nothing
    }
}

Upvotes: 3

Views: 870

Answers (4)

Brent Writes Code
Brent Writes Code

Reputation: 19623

You should consider using the java.util.Scanner class that was added in Java 5. It was specifically created for handling these sorts of File and String parsing situations.

Here's a brief example based on your file format (NOTE: I'm leaving out all of the associated error handling for clarity/brevity):

import java.util.Scanner;
import java.io.File;
import java.lang.String;

class Dummy
{
    public static void main(String[] args) throws Exception
    {
       Scanner sc = new Scanner(new File("file.txt"));
       while(sc.hasNext())
       {
               int hei = sc.nextInt();
               int thick = sc.nextInt();
               int r = sc.nextInt();
               int g = sc.nextInt();
               int b = sc.nextInt();
               String title = sc.nextLine().trim();

               System.out.println("Book(" + hei + "," + thick + "," + 
               r + "," + g + "," + b + "," + title + ")");
       }
    }
}

The nice thing about Scanner is that it has constuctors to take in Strings, Files or other InputSources, so you can use it with just about anything. Hope that helps!

Upvotes: 0

BalusC
BalusC

Reputation: 1109332

I have CSV file with sample data in this form

That's not a CSV file. That's a "fixed-width formatted" file.

I'm wondering if there are no better/easier/shorter ways of doing this

Use a real CSV file format. Then parsing/formatting would be easy with lot of available Java CSV API's. For example OpenCSV. You can even use it to convert between a List of beans (like as Book in your case) and a CSV file.

(from a comment) the file is already created and I must keep it in that form. What about regex ?

Regex would only make things worse, since it's not in a regular format, but in a fixed format! If you can't change the format, even not to CSV, then, well, your approach is as far fine. I would only replace replaceAll(" ", "") by trim() since that's efficienter (the one is regex, the other is just parsing). Replacing Book[] by List<Book> is also a good suggestion, it's more easy to add another book. You can then just do books.add(book). Also see the Collections tutorial.

Upvotes: 1

trashgod
trashgod

Reputation: 205875

StreamTokenizer seems made for this, as suggested in this example. It's a bit dated, but it can be fairly fast when used with a BufferedReader.

Upvotes: 0

nevets1219
nevets1219

Reputation: 7706

You shouldn't use substring since that restricts the format/length of your data. If you have some control over how the CSV is generated (specifically the delimiter) you can use StringTokenizer. You may want to use an array to represent a single line's worth of data as well (defining a few constants to help clarify which element represents what).

Upvotes: 0

Related Questions