Michael Heneghan
Michael Heneghan

Reputation: 307

Parsing .csv file using Java 8 Stream

I have a .csv file full of data on over 500 companies. Each row in the file refers to a particular companies dataset. I need to parse this file and extrapolate data from each to call 4 different web services.

The first line of the .csv file contains the column names. I am trying to write a method that takes a string param and this relates to the column title found in the .csv file.

Based on this param, I want the method to parse the file using Java 8's Stream functionality and return a list of the data taken from the column title for each row/company.

I feel like I am making it more complicated than it needs to be but cannot think of a more efficient way to achieve my goal.

Any thoughts or ideas would be greatly appreciated.

Searching through stackoverflow I found the following post which is similar but not quite the same. Parsing a CSV file for a unique row using the new Java 8 Streams API

    public static List<String> getData(String titleToSearchFor) throws IOException{
    Path path = Paths.get("arbitoryPath");
    int titleIndex;
    String retrievedData = null;
    List<String> listOfData = null;

    if(Files.exists(path)){ 
        try(Stream<String> lines = Files.lines(path)){
            List<String> columns = lines
                    .findFirst()
                    .map((line) -> Arrays.asList(line.split(",")))
                    .get();

            titleIndex = columns.indexOf(titleToSearchFor);

            List<List<String>> values = lines
                    .skip(1)
                    .map(line -> Arrays.asList(line.split(",")))
                    .filter(list -> list.get(titleIndex) != null)
                    .collect(Collectors.toList());

            String[] line = (String[]) values.stream().flatMap(l -> l.stream()).collect(Collectors.collectingAndThen(
                    Collectors.toList(), 
                    list -> list.toArray()));
            String value = line[titleIndex];
            if(value != null && value.trim().length() > 0){
                retrievedData = value;
            }
            listOfData.add(retrievedData);
        }
    }
    return listOfTitles;
}

Thanks

Upvotes: 12

Views: 61425

Answers (4)

ixeption
ixeption

Reputation: 2060

You should not reinvent the wheel and use a common csv parser library. For example you can just use Apache Commons CSV.

It will handle a lot of things for you and is much more readable. There is also OpenCSV, which is even more powerful and comes with annotations based mappings to data classes.

 try (Reader reader = Files.newBufferedReader(Paths.get("file.csv"));
            CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT
                    .withFirstRecordAsHeader()        
        ) {
            for (CSVRecord csvRecord : csvParser) {
                // Access
                String name = csvRecord.get("MyColumn");
                // (..)
          }

Edit: Anyway, if you really want to do it on your own, take a look at this example.

Upvotes: 22

Andbdrew
Andbdrew

Reputation: 11895

As usual, you should use Jackson! Check out the docs

If you want Jackson to use the first line as header info:

public class CsvExample {
    public static void main(String[] args) throws IOException {
        String csv = "name,age\nIBM,140\nBurger King,76";
        CsvSchema bootstrapSchema = CsvSchema.emptySchema().withHeader();
        ObjectMapper mapper = new CsvMapper();
        MappingIterator<Map<String, String>> it = mapper.readerFor(Map.class).with(bootstrapSchema).readValues(csv);
        List<Map<String, String>> maps = it.readAll();
    }
}

or you can define your schema as a java object:

public class CsvExample {
    private static class Pojo {
        private final String name;
        private final int age;

        @JsonCreator
        public Pojo(@JsonProperty("name") String name, @JsonProperty("age") int age) {
            this.name = name;
            this.age = age;
        }

        @JsonProperty("name")
        public String getName() {
            return name;
        }

        @JsonProperty("age")
        public int getAge() {
            return age;
        }
    }

    public static void main(String[] args) throws IOException {
        String csv = "name,age\nIBM,140\nBurger King,76";
        CsvSchema bootstrapSchema = CsvSchema.emptySchema().withHeader();
        ObjectMapper mapper = new CsvMapper();
        MappingIterator<Pojo> it = mapper.readerFor(Pojo.class).with(bootstrapSchema).readValues(csv);
        List<Pojo> pojos = it.readAll();
    }
}

Upvotes: 1

Andrew
Andrew

Reputation: 49656

I managed to shorten your snippet a bit.

If I get you correctly, you need all values of a particular column. The name of that column is given.

The idea is the same, but I improved reading from the file (it reads once); removed code duplication (like line.split(",")), unnecessary wraps in List (Collectors.toList()).

// read lines once
List<String[]> lines = lines(path).map(l -> l.split(","))
                                  .collect(toList());

// find the title index
int titleIndex = lines.stream()
                      .findFirst()
                      .map(header -> asList(header).indexOf(titleToSearchFor))
                      .orElse(-1);

// collect needed values
return lines.stream()
            .skip(1)
            .map(row -> row[titleIndex])
            .collect(toList());

I've got 2 tips not related to the issue:

1. You have hardcoded a URI, it's better to move the value to a constant or add a method param.
2. You could move the main part out of the if clause if you checked the opposite condition !Files.exists(path) and threw an exception.

Upvotes: 3

davidxxx
davidxxx

Reputation: 131516

1) You cannot invoke multiple terminal operations on a Stream.
But you invoke two of them : findFirst() to retrieve the column names and then collect() to collect the line values. The second terminal operation invoked on the Stream will throw an exception.

2) Instead of Stream<String> lines = Files.lines(path)) that reads all lines in a Stream, you should make things in two times by using Files.readAllLines() that return a List of String.
Use the first element to retrieve the column name and use the whole list to retrieve the value of each line matching to the criteria.

3) You split the retrieval in multiple little steps that you can shorter in a single stream processing that will iterate all lines, keep only which of them where the criteria matches and collect them.

It would give something like :

public static List<String> getData(String titleToSearchFor) throws IOException {
    Path path = Paths.get("arbitoryPath");

    if (Files.exists(path)) {
        List<String> lines = Files.readAllLines(path);

        List<String> columns = Arrays.asList(lines.get(0)
                                                  .split(","));

        int titleIndex = columns.indexOf(titleToSearchFor);

        List<String> values = lines.stream()
                                   .skip(1)
                                   .map(line -> Arrays.asList(line.split(",")))
                                   .map(list -> list.get(titleIndex))
                                   .filter(Objects::nonNull)
                                   .filter(s -> s.trim()
                                                 .length() > 0)
                                   .collect(Collectors.toList());

        return values;
    }

    return new ArrayList<>();

}

Upvotes: 1

Related Questions