Sasha
Sasha

Reputation: 1570

Skip blank lines while reading .csv file using opencsv (java)

Good day everyone! My target is to make csv reader to skip the blank lines while parsing a file, do nothing basically, only get me the rows with at least one value. At the moment I have two methods -> 1st is just reading all rows as List of Strings array and returns it, 2nd converts the result into List of Lists of Strings, both are bellow:

private List<String[]> readCSVFile(File filename) throws IOException {

    CSVReader reader = new CSVReader(new FileReader(filename));
    List<String[]> allRows = reader.readAll();

    return allRows;

}

public List<List<String>> readFile(File filename) throws IOException {

        List<String[]> allRows = readCSVFile(filename);     
        List<List<String>> allRowsAsLists = new ArrayList<List<String>>();      
        for (String[] rowItemsArray :  allRows) {
            List<String> rowItems = new ArrayList<String>();
            rowItems.addAll(Arrays.asList(rowItemsArray));
            allRowsAsLists.add(rowItems);

        }
    return allRowsAsLists;

}

My first thought was to check (in the 2'nd method) the length of an array if its 0 just to ignore it - which would be something like this:

for (String[] rowItemsArray :  allRows) {
            **if(rowItemArray.length == 0) continue;**
            List<String> rowItems = new ArrayList<String>();
            rowItems.addAll(Arrays.asList(rowItemsArray));
            allRowsAsLists.add(rowItems);

}  

Unfortunately that didn't work for the reason that even if the row is blank it still returns an array of elements - empty Strings in fact. Checking an individual String is not an option as there are 100+ columns and this is variable. Please suggest what’s the best way to achieve this. Thanks.

Sorted it out this way:

    public List<List<String>> readFile(File filename) throws IOException {

            List<String[]> allRows = readCSVFile(filename, includeHeaders, trimWhitespacesInFieldValues);       
            List<List<String>> allRowsAsLists = new ArrayList<List<String>>();      
            for (String[] rowItemsArray :  allRows) {
                **if(allValuesInRowAreEmpty(rowItemsArray)) continue;**
                List<String> rowItems = new ArrayList<String>();
                rowItems.addAll(Arrays.asList(rowItemsArray));
                allRowsAsLists.add(rowItems);

            }
            return allRowsAsLists;

        }

    private boolean allValuesInRowAreEmpty(String[] row) {
        boolean returnValue = true;
        for (String s : row) {
            if (s.length() != 0) {
                returnValue = false;
            }
        }
        return returnValue;
    }

Upvotes: 9

Views: 21260

Answers (8)

Muchtar
Muchtar

Reputation: 122

Another way to do it, is to override the allowLine() method of the Class CsvToBeanFilter and pass it to .withFilter() when building the bean (example in Kotlin, should also work for Java):

import com.opencsv.bean.CsvToBeanFilter
import com.opencsv.bean.CsvToBeanBuilder
import java.io.FileReader

class NonBlankLinesFilter : CsvToBeanFilter {
    override fun allowLine(line: Array<String?>?): Boolean {
        if (line == null) {
            return false
        }
        for (value in line) {
            if (!value.isNullOrBlank()) {
                return true
        }
    }
    return false
  }
}

fun main() {
  val reader = FileReader("path/to/csv/file.csv")

  val csvToBean = CsvToBeanBuilder<MyBean>(reader)
        .withType(MyBean::class.java)
        .withSeparator(',')
        .withFilter(NonBlankLinesFilter())
        .build()

  val uploadList = csvToBean.parse()

  println(uploadList)
}

Upvotes: 0

i.karayel
i.karayel

Reputation: 4885

You can use a filter with lambda: like below:

CsvToBean<T> csvToBean = new CsvToBeanBuilder<T>(new StringReader(CSV_HEADER + "\n" + lines))
    .withType(clazz)
    .withFieldAsNull(CSVReaderNullFieldIndicator.EMPTY_SEPARATORS)
    .withSeparator(delimiter)
    .withSkipLines(skipLines)
    .withIgnoreLeadingWhiteSpace(true).withFilter(strings -> {
      for (String r : strings) {
        if (r != null && r.length() > 0) {
          return true;
        }
      }
      return false;
    }).build();

Your lambda filter:

.withFilter(strings -> {
      for (String r : strings) {
        if (r != null && r.length() > 0) {
          return true;
        }
      }
      return false;
    })

Upvotes: 1

theINtoy
theINtoy

Reputation: 3708

The JavaDoc for CsvToBeanFilter states "Here's an example showing how to use CsvToBean that removes empty lines. Since the parser returns an array with a single empty string for a blank line that is what it is checking." and lists an example of how to do this:

private class EmptyLineFilter implements CsvToBeanFilter {

    private final MappingStrategy strategy;

    public EmptyLineFilter(MappingStrategy strategy) {
        this.strategy = strategy;
    }

    public boolean allowLine(String[] line) {
        boolean blankLine = line.length == 1 && line[0].isEmpty();
        return !blankLine;
    }

 }

 public List<Feature> parseCsv(InputStreamReader streamReader) {
    HeaderColumnNameTranslateMappingStrategy<Feature> strategy = new HeaderColumnNameTranslateMappingStrategy();
    Map<String, String> columnMap = new HashMap();
    columnMap.put("FEATURE_NAME", "name");
    columnMap.put("STATE", "state");
    strategy.setColumnMapping(columnMap);
    strategy.setType(Feature.class);
    CSVReader reader = new CSVReader(streamReader);
    CsvToBeanFilter filter = new EmptyLineFilter(strategy);
    return new CsvToBean().parse(strategy, reader, filter);
 }

Upvotes: 0

JiP
JiP

Reputation: 3268

If you do not parse into a Bean, you can use Java Streams API to help you with filtering of invalid CSV rows. My approach is like this (where is is java.io.InputStream instance with CSV data and YourBean map(String[] row) is your mapping method that maps a CSV row to a your Java object:

CSVParser csvp = new CSVParserBuilder()
    .withSeparator(';')
    .withFieldAsNull(CSVReaderNullFieldIndicator.BOTH)
    .build();
CSVReader csvr = new CSVReaderBuilder(new InputStreamReader(is))
    .withCSVParser(csvp)
    .build();
List<YourBean> result = StreamSupport.stream(csvr.spliterator(), false)
    .filter(Objects::nonNull)
    .filter(row -> row.length > 0)
    .map(row -> map(row))
    .collect(Collectors.toList());

Upvotes: 0

Patrik
Patrik

Reputation: 97

Here is an updated solution with lambdas based on @Martin's solution:

InputStream inputStream; // provided
List<MyBean> data = new CsvToBeanBuilder(new BufferedReader(new InputStreamReader(inputStream)))
    .withType(MyBean.class)
    // This filter ignores empty lines from the input
    .withFilter(stringValues -> Arrays.stream(stringValues)
        .anyMatch(value -> value != null && value.length() > 0))
    .build()
    .parse();

Upvotes: 2

Martin
Martin

Reputation: 145

For opencsv 5.0 there is an API-option to read CSV lines directly into a Bean.

For people who prefer using the "CsvToBean" feature, the following solution is using the (sadly deprecated) #withFilter(..) method on CsvToBeanBuilder to skip blank lines in the Inputstream:

InputStream inputStream; // provided
List<MyBean> data = new CsvToBeanBuilder(new BufferedReader(new InputStreamReader(inputStream)))
    .withType(MyBean.class)
    .withFilter(new CsvToBeanFilter() {
        /*
         * This filter ignores empty lines from the input
         */
        @Override
        public boolean allowLine(String[] strings) {
            for (String one : strings) {
                if (one != null && one.length() > 0) {
                    return true;
                }
            }
            return false;
        }
    }).build().parse();

Update: With opencsv Release 5.1 (dated 2/2/2020), CsvToBeanFilter got undeprecated as per feature request #120.

Upvotes: 8

chris
chris

Reputation: 1785

You could summarize all string values per row after trimming them. If the resulting string is empty, there are no values in any cell. In that case ignore the line.
Something like this:

private boolean onlyEmptyCells(ArrayList<String> check) {
    StringBuilder sb = new StringBuilder();
    for (String s : check) {
        sb.append(s.trim());
    }
    return sb.toString().isEmpty(); //<- ignore 'check' if this returns true
}

Upvotes: 0

SubOptimal
SubOptimal

Reputation: 22983

You could check the length and the first element. If the line contains only a field separator then the lenght > 1. If the line contains a single space character then the first element is not empty.

if (rowItemsArray.length == 1 && rowItemsArray[0].isEmpty()) {
    continue;
}

Upvotes: 9

Related Questions