Manish Kumar Sharma
Manish Kumar Sharma

Reputation: 13442

Remove duplicate text values from all JSON Arrays using Jackson

I have a JSON file that has several text arrays containing duplicate values. For ex:

{
    "mName": "Carl Sanchez",
    "mEmailID": "[email protected]",
    "mPhoneNo": 7954041324,

    "tutorTypes": [
        " Freelancer/Professional Tutor",
        " Freelancer/Professional Tutor",
        " Coaching Institute Teacher ",
        " Corporate Professional ",
        " Freelancer/Professional Tutor",
        " Freelancer/Professional Tutor",
        " Freelancer/Professional Tutor",
        " Freelancer/Professional Tutor",
        " Freelancer/Professional Tutor",
        " Freelancer/Professional Tutor",
        " Freelancer/Professional Tutor",
        " Freelancer/Professional Tutor",
        " Freelancer/Professional Tutor"
    ],
    "disciplines": [
        " Japanese",
        " German ",
        " Japanese",
        " German ",
        " Japanese",
        " Hindi ",
        " Japanese",
        " French "
    ]
}

I want to remove duplicate values(the textual values) from all the arrays in the JSON source. In the above example, that would be the removing the duplicate languages and tutor types from the arrays. The desired output would be the above JSON source with just the duplicate values removed wherever applicable. Also, I don't want to bind the code to a particular JSON field name but rather in general any array of text values. The desired output in above example would be,

{
    "mName": "Carl Sanchez",
    "mEmailID": "[email protected]",
    "mPhoneNo": 7954041324,

    "tutorTypes": [
        " Freelancer/Professional Tutor",
        " Coaching Institute Teacher ",
        " Corporate Professional "
    ],
    "disciplines": [
        " Japanese",
        " German ",
        " Hindi ",
        " French "
    ]
}

The input source of JSON is a file and I want to write the output in a file. I have attempted a program to accomplish this using Jackson data-binding API:

public static void removeDuplicateStringElementsFromAllArrays(String file) throws IOException {

        Writer fileWriter = new BufferedWriter(new FileWriter(new File("out.json")));

        JsonFactory f = new MappingJsonFactory();
        JsonParser jp = f.createJsonParser(new File(file));

        parse(jp, fileWriter);
    }

    private static void parse(JsonParser jp, Writer writer) throws IOException{
        JsonToken current;
        current = jp.nextToken();

        if(current != null){
            System.out.println(current.asString());
            writer.write(current.asString());
        }

        if(current == JsonToken.START_ARRAY){
            if(jp.nextTextValue() != null){
                JsonNode node = jp.readValueAsTree();
                // Trim the String values
                String[] values = ArraysUtil.trimArray("\"" , node.toString().split(","), "\"");
                // Ensure that there is no duplicate value
                values = new HashSet<String>(Arrays.asList(values)).toArray(new String[0]);
                // Finally, concatenate the values back and stash them to file
                String concatValue = String.join(",", values);

                // Write the concatenated values to file
                writer.write(concatValue);
            }
            else{
                parse(jp, writer);
            }
        }
        else{
            // Move on directly
            parse(jp, writer);
        }
    }

I am getting several nulls as output. I have an idea of why this might be happening. I think, when I call jp.nextTextValue(), the parser has moved on and constructing a value tree might have resulted in that but I am unable to figure out any workaround to this. Does anyone know, how I might go about accomplishing the task.

EDIT:

Just want to add one thing here - I am using the Jackson-Databind API because it is built on Streaming API which is efficient when parsing a large JSON source, which is my case. So, a solution taking into consideration this would be appreciated.

Upvotes: 2

Views: 3182

Answers (2)

Ravi MCA
Ravi MCA

Reputation: 2621

create a bean Contact.java and declare properties as Set for which you want remove duplicates.

When you serialize the JSON the Set will do the job of removing the duplicates. No extra code is required.

package com.tmp;

import java.util.Set;

public class Contact {

    String      mName;
    String      mEmailID;
    long        mPhoneNo;

    Set<String> tutorTypes; // to remove duplicates
    Set<String> disciplines; // to remove duplicates

    // setter and getter methods goes here...    
}

Remove duplicates

package com.tmp;

import java.io.File;
import java.io.IOException;

import com.fasterxml.jackson.databind.ObjectMapper;


/**
 * 
 * @author Ravi P
 */
class Tmp {

    public static void main( String[] args ) throws IOException {

        ObjectMapper mapper = new ObjectMapper();

        Contact contact = mapper.readValue( new File( "D:\\tmp\\file.json" ), Contact.class );

        mapper.writeValue( new File( "D:\\tmp\\file1.json" ), contact );

    }
}

Upvotes: 3

Monish Sen
Monish Sen

Reputation: 1888

Here is an example using Json Simple. note that this assumes the arrays to be present at root level and doesnt check for nested arrays in each parameter. You can add a recursing logic if you want to support that

package test.json.jsonsimple;

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;

import org.json.simple.JSONArray;
import org.json.simple.JSONObject;
import org.json.simple.parser.JSONParser;
import org.json.simple.parser.ParseException;

public class App 
{
    @SuppressWarnings("unchecked")
    public static void main( String[] args )
    {
        System.out.println( "Hello World!" );

        JSONParser parser = new JSONParser();

        try {
            JSONObject outmap = new JSONObject();
            Object obj = parser.parse(new FileReader("d:\\in.json"));
            JSONObject jsonObject = (JSONObject) obj;
            for(Object o : jsonObject.entrySet()){
                if(o instanceof Map.Entry){
                    Map.Entry<String, Object> entry = (Map.Entry<String, Object>) o;
                    if(entry !=null ){
                        if(entry.getValue() instanceof JSONArray){
                            Set<String> uniqueValues = removeDuplicates(entry.getValue());
                            outmap.put(entry.getKey(), uniqueValues);
                        }else{
                            outmap.put(entry.getKey(), entry.getValue());
                        }
                    }
                }
            }

            FileWriter file = new FileWriter("d:\\out.json");
            file.write(outmap.toJSONString());
            file.flush();
            file.close();

        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } catch (ParseException e) {
            e.printStackTrace();
        }

    }

    @SuppressWarnings("unchecked")
    private static Set<String> removeDuplicates(Object value) {
        Set<String> outset = new HashSet<String>();
        JSONArray inset = (JSONArray) value;

        if (inset != null) {
            Iterator<String> iterator = inset.iterator();
            while (iterator.hasNext()) {
                outset.add(iterator.next());
            } 
        }
        return outset;
    }
}

Upvotes: 0

Related Questions