Reputation: 932

Convert CSV file to JSON file with Apache Camel via List of Maps

The goal is to learn how to read CSV to List of Maps, then how to marshal it to JSON.

Once I understand how to do it, I will get how to define more useful routes.

I use XML to define routes, another restriction is not to create any transformation beans, but use only the existing components.

My understanding is obvoiusly lacks some concept. I understand that you have to provide a bean as a consumer, then you may pass it on; but what's wrong with the List of Maps that the doc says the csv dataformat uses?

    <dataFormats>
        <json id="jack" library="Jackson"/>
    </dataFormats>  

    <route>
        <from uri="file:///C:/tries/collApp/exchange/in?fileName=registerSampleSmaller.csv"/>
        <unmarshal>
            <csv />
        </unmarshal>            
        <marshal ref="jack">                
        </marshal>
        <to uri="file:///C:/tries/collApp/exchange/out?fileName=out.json"/>          
    </route>

silently does nothing. I can only see how the lock file appears and disappears.

Thanks!

ps/ I am looking forward to create two routes, the first will read a csv, transform it - shaping it's flat nature to that of my persistent beans, than pass it to my beans. And the second will just save my beans as json, seems to be an easy part; but I first need to do this to understand how it works

Upvotes: 1

Answers (3)

fedd

Reputation: 932

I am providing an answer as I have moved forward.

I was on the right track, there were just small errors. One was noticed by Jérémie B in comments to an original questions.

It failed silently because I haven't enabled logging, I did it by adding slf4j like this in my pom.xml:

    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>${slf4j-version}</version>
    </dependency>    
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-jdk14</artifactId>
        <version>${slf4j-version}</version>
    </dependency>    
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>jcl-over-slf4j</artifactId>
        <version>${slf4j-version}</version>
    </dependency>

I saw numerous errors, and even Camel buggy behaviour, but I have manages to make this route work:

    <dataFormats>
        <json id="jack" library="Jackson" prettyPrint="true"/>
    </dataFormats>       

    <route>

        <from uri="file:///C:/tries/collApp/exchange/in?fileName=registerSampleUtf.csv&amp;charset=UTF-8"/>
        <log message="file: ${body.class.name} ${body}" loggingLevel="WARN"/>
        <unmarshal>
            <csv delimiter=";"  useMaps="true" />
        </unmarshal>           
        <log message="unmarshalled: ${body.class.name} ${body}" loggingLevel="WARN"/>
        <marshal ref="jack"/>
        <log message="marshalled: ${body}" loggingLevel="WARN"/>
        <to uri="file:///C:/tries/collApp/exchange/out?fileName=out.json"/>         
    </route>

So basically, after cleaning typos I had to

specify input file charset,
specify a delimiter that Excel used to create my csv,
tell to put it in Maps.

Unfortunately this particular code doesn't work, possibly due to a Camel bug which I reported to the developer comunity (no reaction yet still, http://camel.465427.n5.nabble.com/A-possible-bug-in-IOConverter-with-Win-1251-charset-td5778665.html)

Though I moved forward, probably now I am bypassing the flawed Camel's IOConverter, and currently I am on this stage (this is not as an answer to the question, just for the info, how handy Camel can be):

    <route>
        <from uri="file:///C:/tries/collApp/exchange/in?fileName=registerSampleSmaller1.csv&amp;charset=windows-1251"/>
        <split streaming="true">
            <method ref="csvSplitter" method="tokenizeReader"/>  <!-- aprepends the first line of file for every subsequent line -->
            <log message="splitted: ${body}" loggingLevel="DEBUG"/>
            <unmarshal>
                <csv delimiter=";"  useMaps="true" />
            </unmarshal>            
            <log message="unmarshalled: size: ${body.size()}, ${body}" loggingLevel="DEBUG"/>
            <filter>
                <simple>${body.size()} == 1</simple><!-- be sure to have spaces around an operator -->
                <log message="filtered: listItem: ${body[0]['PATRONYMIC']}, list: ${body}" loggingLevel="DEBUG"/>
                <transform>
                    <spel>#{
                        {
                        lastName:body[0]['LAST_NAME'],
                        firstName: body[0]['FIRST_NAME'],
                        patronymic: body[0]['PATRONYMIC'],
                        comment:body[0]['COMMENT6']
                        }
                        }</spel><!-- split the spel {:} map creation notation in multiline is crucial-->
                </transform>                
                <log message="transformed: ${body}" loggingLevel="DEBUG"/>
                <marshal ref="jack"/>
                <log message="marshalled: ${body}" loggingLevel="DEBUG"/>
                <to uri="file:///C:/tries/collApp/exchange/out?fileName=out${exchangeProperty.CamelSplitIndex}.json"/>          
            </filter>
        </split>
    </route>

I had to write my own CSV splitter (with respect to all Unicode codepoints etc), which is basically adds the first lines to all subsequent lines, but now I am able to split CSV into a set of JSONs in a streamish manner, or handle objects differently instead of marshalling.

**update - csvSplitter code **

Reader Tokenizer - an iterator around a reader:

public class ReaderTokenizer implements Iterator<String> {

private String _curString = null;
private boolean _endReached = false;
private final Reader _reader;
private char[] _token;

public ReaderTokenizer(Reader reader, String token) {
    setToken(token);
    _reader = reader;
}

public final void setToken(String token){
    _token = token.toCharArray();
    if(_token.length==0){
        throw new IllegalArgumentException("Can't tokenize with the empty string");
    }
}

private void _readNextToken() throws IOException {

    int curCharInt;
    char previousChar = (char) -1;
    int tokenPos = 0;
    StringBuilder sb = new StringBuilder(255);

    while (true) {
        curCharInt = _reader.read();
        if (curCharInt == -1) {
            _endReached = true;
            _reader.close();
            break;
        }
        if (curCharInt == _token[tokenPos]) {

            if (tokenPos != 0 || !Character.isHighSurrogate(previousChar)) {
                tokenPos++;

                if (tokenPos >= _token.length) {
                    tokenPos = 0;
                    previousChar = (char) curCharInt;
                    sb.append(previousChar);
                    break;
                }
            }
        }

        previousChar = (char) curCharInt;
        sb.append(previousChar);
    }
    _curString = sb.toString();
}

@Override
public boolean hasNext() {
    if (_curString == null) {
        if (_endReached) {
            return false;
        }
        try {
            _readNextToken();
        } catch (IOException ex) {
            throw new RuntimeException(ex);
        }

        if (_curString != null) {
            return true;
        }

        if (_endReached) {
            return false;
        }

        throw new RuntimeException("Someting wrong");

    } else {
        return true;
    }
}

@Override
public String next() {
    if (_curString != null) {
        String ret = _curString;
        _curString = null;
        return ret;
    }
    if (_endReached) {
        throw new NoSuchElementException();
    }

    try {
        _readNextToken();
    } catch (IOException ex) {
        throw new RuntimeException(ex);
    }

    if (_curString != null) {
        String ret = _curString;
        _curString = null;
        return ret;
    }

    throw new RuntimeException("Someting wrong");
}

@Override
public void remove() {
    throw new UnsupportedOperationException("Not supported.");
}

}

The splitter itself:

public class CamelReaderSplitter {

private final String _token;
private final int _headerLinesNumber;

public CamelReaderSplitter(String token, int headerLinesNumber) {
    _token = token;
    _headerLinesNumber = headerLinesNumber;
}

public CamelReaderSplitter(String token) {
    _token = token;
    _headerLinesNumber = 1;
}

public CamelReaderSplitter(int headerLinesNumber) {
    _token = "\r\n";
    _headerLinesNumber = headerLinesNumber;
}

public CamelReaderSplitter() {
    _token = "\r\n";
    _headerLinesNumber = 1;
}

public Iterator<String> tokenizeReader(final Reader reader) throws IOException {

    Iterator<String> ret = new ReaderTokenizer(reader, _token) {

        private final String _firstLines;

        {
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < _headerLinesNumber; i++) {
                if (super.hasNext()) {
                    sb.append(super.next());
                }
            }
            _firstLines = sb.toString();
        }

        @Override
        public String next() {
            return _firstLines + super.next();
        }

    };

    return ret;

}

}

Upvotes: 1

Souciance Eqdam Rashti

Reputation: 3191

Why do you want list of maps? Why not a list?

Unmarshal the CSV to list of POJOs using Bindy.
Marshal the list to JSON using either annotated classes or standard jackson code.

Let me know if you want code samples. But the main idea is to always check the body after unmarshaling the csv. You will most likely get a list of pojo. Then just iterate the list and for each pojo get the "getters" and set the value of the json tags.

Upvotes: 1

Nderon Hyseni

Reputation: 247

Please take a look at this DataFormat page[1] you can use those marshal DSL to turn the object to the String with the formate you want. [1]http://camel.apache.org/data-format.html

Upvotes: 1

Convert CSV file to JSON file with Apache Camel via List of Maps

Answers (3)

Related Questions