Hirthas
Hirthas

Reputation: 359

openCSV not reading my entire file

I have an application in Java that I am using openCSV to read a file (very large). I am then putting the 4th (Eventually this will have another column or two added if that makes a difference) column into a HashSet and outputting that to a new file. This all seems to work fine but I discovered it is only reading part of the file (131,544 lines of 272,948). Is this a limitation of the openCSV or Java in general or is there a way to get around this?

My code for reference:

public static void main(String[] args) throws IOException {
    String itemsFile = new String();        
    String outFile = new String();
    itemsFile = "items.txt";        
    outFile = "so.txt";
    CSVReader reader = null;
    try {
        reader = new CSVReader(new FileReader(itemsFile), '\t');
    } catch (FileNotFoundException e) {
        System.out.println(e.getMessage());
        e.printStackTrace();
    }

    String[] nextLine;
    HashSet<String> brands = new HashSet<>();               
    while ((nextLine = reader.readNext()) != null) {
        brands.add(nextLine[4]);            
    }               

    String[] brandArray = new String[brands.size()];
    Iterator<String> it = ((HashSet<String>) brands).iterator();
    int listNum = 0;
    while (it.hasNext()) {
        Object brand = (Object) it.next();
        brandArray[listNum] = (String) brand;
        listNum++;
    }

    CSVWriter writer = new CSVWriter(new FileWriter(outFile), '\n');
    writer.writeNext(brandArray);           
    writer.close();
}

I apologize if my code is messy this is my first real "Completed" Java application. Any assistance is much appreciated.

I've even tried removing those lines from the txt file to make sure it's not hanging up on some character or something but it seems to stop on that line anyway

Upvotes: 7

Views: 8385

Answers (2)

D-rk
D-rk

Reputation: 5919

For me the issue was a bug in OpenCSV 3.4 when the end of a line coincides with the end of the bufferedReaders buffer.

This test shows the bug:

    @Test
    void readWithBufferSize() throws IOException {

        for (int bufferSize = 2; bufferSize <= 3; bufferSize++) {
            // A <CR> <LF> B <NULL>
            byte[] content = {65, 13, 10, 66, 0};

            InputStream is = new ByteArrayInputStream(content);
            BufferedReader bfReader = new BufferedReader(new InputStreamReader(is), bufferSize);
            CSVReader reader = new CSVReader(bfReader);

            List<String> rows = new ArrayList<>();
            String[] cols;
            while((cols = reader.readNext()) != null) {
                rows.add(String.join(",", cols));
            }

            System.out.printf("buffer size: %d rows: %s%n", bufferSize, String.join(",", rows));
            // this fails for bufferSize = 3
            assert (rows.size() == 2);
        }
    }

Upvotes: 0

Hirthas
Hirthas

Reputation: 359

OK I figured this out thanks to user @Michael in chat. Apparently openCSV can't handle such a large file because it is not streaming. SO I looked into streaming this file and it works great.

Here's the end code:

public static void main(String[] args) throws IOException {

    String fileName = new String();
    fileName = "items.txt";
    String outputFile = new String();
    outputFile = "so.txt";      
    String thisLine;
    HashSet<String> brand = new HashSet<>();
    FileInputStream fis = new FileInputStream(fileName);
    @SuppressWarnings("resource")
    BufferedReader myInput = new BufferedReader(new InputStreamReader(fis));
    while ((thisLine = myInput.readLine()) != null) {
        String[] line = thisLine.split("\t");
        if (line[20].equals("1")) {
            if (!line[2].equals("") && !line[2].equals(" ")
                    && !line[2].equals(null)) {                 
                if(line[2].indexOf("'") > -1){
                    System.out.println(line[2]);
                    line[2] = line[2].replace("'", "\'");
                    System.out.println(line[2]);
                }

                brand.add(line[2]);
            }
        }
        if (!line[3].equals("") && !line[3].equals(" ")
                && !line[3].equals(null)) {             
                line[3] = line[3].replace("'", "\'");               
            brand.add(line[3]);
        }
        if (!line[4].equals("") && !line[4].equals(" ")
                && !line[4].equals(null)) {
            if(line[4].indexOf("'") > -1){
                System.out.println(line[4]);
                line[4] = line[4].replace("'", "\'");
                System.out.println(line[4]);
            }


            brand.add(line[4]);
        }
    }

    String[] brands = brand.toArray(new String[brand.size()]);

    try {
        FileWriter fstream = new FileWriter(outputFile);
        BufferedWriter bw = new BufferedWriter(fstream);
        for (int i = 0; i < brands.length; i++) {

            if (i == 0) {
                bw.write("'" + brands[i] + "'");
            } else {
                bw.write(",'" + brands[i] + "'");
            }
        }           

        bw.close();
    } catch (Exception e) {
        System.out.println(e.getMessage());
        e.printStackTrace();
    }
}

Thanks for everyone's help on this.

Upvotes: 10

Related Questions