Joseph Gagnon
Joseph Gagnon

Reputation: 2115

Using Jackson to convert CSV to JSON - How to remove newlines embedded in CSV column header

After some quick Googling, I found an easy way to read and parse a CSV file to JSON using the Jackson library. All well and good, except ... some of the CSV header column names have embedded newlines. The program handles it, but I'm left with JSON keys with newlines embedded within. I'd like to remove these (or replace them with a space).

Here is the simple program I found:

import java.io.File;
import java.util.List;
import java.util.Map;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;

public class CSVToJSON {

  public static void main(String[] args) throws Exception {
    File input = new File("PDM_BOM.csv");
    File output = new File("output.json");

    CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
    CsvMapper csvMapper = new CsvMapper();

    // Read data from CSV file
    List<Object> readAll = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
        .readAll();

    ObjectMapper mapper = new ObjectMapper();

    // Write JSON formated data to output.json file
    mapper.writerWithDefaultPrettyPrinter().writeValue(output, readAll);

    // Write JSON formated data to stdout
    System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll));
  }
}

So, as an example:

PARENT\nITEM\nNUMBER

Here's an example of what is produced:

"PARENT\nITEM\nNUMBER" : "208E8840040",

I need this to be:

"PARENT ITEM NUMBER" : "208E8840040",

Is there a configuration setting on the Jackson mapper that can handle this? Or, do I need to provide some sort of custom "handler" to the mapper?

Special cases

To add some complexity, there are cases where just replacing the newline with a space will not always yield what is needed.

Example 1:

Sometimes there is a column header like this:

QTY\nORDER/\nTRANSACTION

In this case, I need the newline removed and replaced with nothing, so that the result is:

QTY ORDER/TRANSACTION , not QTY ORDER/ TRANSACTION

Example 2:

Sometimes, for whatever reason, a column header has a space before the newline:

EFFECTIVE \nTHRU DATE

This needs to come out as:

EFFECTIVE THRU DATE , not EFFECTIVE THRU DATE

Any ideas on how to handle at least the main issue would be very much appreciated.

Upvotes: 0

Views: 2201

Answers (2)

Joseph Gagnon
Joseph Gagnon

Reputation: 2115

OK, came up with a solution. It's ugly, but it works. Basically, after the CsvMapper finishes, I go through the giant ugly collection that's produced and do a String.replaceAll (thanks to https://stackoverflow.com/users/4402505/prem-kurian-philip for that suggestion) to remove the unwanted characters and then rebuild the map.

In any case here's the new code:

public class CSVToJSON {

  public static void main(String[] args) throws Exception {
    File input = new File("PDM_BOM.csv");
    File output = new File("output.json");

    CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
    CsvMapper csvMapper = new CsvMapper();

    // Read data from CSV file
    List<Object> readData = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
        .readAll();

    for (Object mapObj : readData) {
      LinkedHashMap<String, String> map = (LinkedHashMap<String, String>) mapObj;
      List<String> deleteList = new ArrayList<>();
      LinkedHashMap<String, String> insertMap = new LinkedHashMap<>();

      for (Object entObj : map.entrySet()) {
        Entry<String, String> entry = (Entry<String, String>) entObj;
        String oldKey = entry.getKey();
        String newKey = oldKey.replaceAll("[\n\s]+", " ");
        String value = entry.getValue();

        deleteList.add(oldKey);
        insertMap.put(newKey, value);
      }

      // Delete the old ...
      for (String oldKey : deleteList) {
        map.remove(oldKey);
      }

      // and bring in the new
      map.putAll(insertMap);
    }

    ObjectMapper mapper = new ObjectMapper();

    // Write JSON formated data to output.json file
    mapper.writerWithDefaultPrettyPrinter().writeValue(output, readData);

    // Write JSON formated data to stdout
    System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll));
  }
}

It seems like there should be a better way to achieve this.

Upvotes: 0

Prem Kurian Philip
Prem Kurian Philip

Reputation: 306

You can use the String replaceAll() method to replace all new lines with spaces.

String str = mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll);
str = str.trim().replaceAll("[\n\s]+", " ");

Upvotes: 1

Related Questions