user12432912
user12432912

Reputation: 21

Query a JSON file with Java-Large file

I am trying to parse below JSON file using java. I need to be able to

The search should return entire object. The File will be huge and the search should still be time efficient.


[
  {
    "id": 1,
    "name": "Mark Robb",
    "last_login": "2013-01-21T05:13:41 -11:30",
    "email": "[email protected]",
    "phone": "12345",
    "locations": [
        "Germany",
        "Austria"
    ]
},
  {
    "id": 2,
    "name": "Matt Nish",
    "last_login": "2014-02-21T07:10:41 -11:30",
    "email": "[email protected]",
    "phone": "456123",
    "locations": [
        "France",
        "Italy"
    ]
 }
]


This is what I have tried so far using Jackson library.

public void findById(int id) {
List<Customer> customers = objectMapper.readValue(new File("src/main/resources/customers.json"), new    TypeReference<List<Customer>>(){});

            for(Customer customer: customers) {
                if(customer.getId() == id) {
                    System.out.println(customer.getName());
                }
            }
}

I just don't think this is an efficient method for a huge JSON file(About 20000 customers in a file). And there could be multiple files. Search time should not increase linearly. How can I make this time efficient? Should I use any other library?

Upvotes: 1

Views: 2723

Answers (3)

AnatolyG
AnatolyG

Reputation: 1587

The most efficient (both CPU and memory) way to parse is to use stream oriented parsing instead of object mapping. Usually, it takes a bit more code to be written, but also usually it is a good deal :) Both Gson and Jackson support such lightweight technique. Also, you should avoid memory allocation in the main/hot path to prevent GC pauses. To illustrate the idea I use a small GC-free library https://github.com/anatolygudkov/green-jelly:

import org.green.jelly.*;    
import java.io.CharArrayReader;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;

public class SelectById {
    public static class Customer {
        private long id;
        private String name;
        private String email;

        public void clear() {
            id = 0;
            name = null;
            email = null;
        }

        public Customer makeCopy() {
            Customer result = new Customer();
            result.id = id;
            result.name = name;
            result.email = email;
            return result;
        }

        @Override
        public String toString() {
            return "Customer{" +
                    "id=" + id +
                    ", name='" + name + '\'' +
                    ", email='" + email + '\'' +
                    '}';
        }
    }

    public static void main(String[] args) throws Exception {
        final String file = "\n" +
            "[\n" +
            "  {\n" +
            "    \"id\": 1,\n" +
            "    \"name\": \"Mark Robb\",\n" +
            "    \"last_login\": \"2013-01-21T05:13:41 -11:30\",\n" +
            "    \"email\": \"[email protected]\",\n" +
            "    \"phone\": \"12345\",\n" +
            "    \"locations\": [\n" +
            "        \"Germany\",\n" +
            "        \"Austria\"\n" +
            "    ]\n" +
            "},\n" +
            "  {\n" +
            "    \"id\": 2,\n" +
            "    \"name\": \"Matt Nish\",\n" +
            "    \"last_login\": \"2014-02-21T07:10:41 -11:30\",\n" +
            "    \"email\": \"[email protected]\",\n" +
            "    \"phone\": \"456123\",\n" +
            "    \"locations\": [\n" +
            "        \"France\",\n" +
            "        \"Italy\"\n" +
            "    ]\n" +
            " }\n" +
            "]\n";

        final List<Customer> selection = new ArrayList<>();

        final long selectionId = 2;

        final JsonParser parser = new JsonParser().setListener(
            new JsonParserListenerAdaptor() {
                private final Customer customer = new Customer();
                private String currentField;
                @Override
                public boolean onObjectStarted() {
                    customer.clear();
                    return true;
                }

                @Override
                public boolean onObjectMember(final CharSequence name) {
                    currentField = name.toString();
                    return true;
                }

                @Override
                public boolean onStringValue(final CharSequence data) {
                    switch (currentField) {
                        case "name":
                            customer.name = data.toString();
                            break;
                        case "email":
                            customer.email = data.toString();
                            break;
                    }
                    return true;
                }

                @Override
                public boolean onNumberValue(final JsonNumber number) {
                    if ("id".equals(currentField)) {
                        customer.id = number.mantissa();
                    }
                    return true;
                }

                @Override
                public boolean onObjectEnded() {
                    if (customer.id == selectionId) {
                        selection.add(customer.makeCopy());
                        return false; // we don't need to continue
                    }
                    return true;
                }
            }
        );

        // now let's read and parse the data with a buffer

        final CharArrayCharSequence buffer = new CharArrayCharSequence(1024);

        try (final Reader reader = new CharArrayReader(file.toCharArray())) { // replace by FileReader, for example
            int len;
            while((len = reader.read(buffer.getChars())) != -1) {
                buffer.setLength(len);
                parser.parse(buffer);
            }
        }
        parser.eoj();

        System.out.println(selection);
    }
}

It should work almost as fast as possible in Java (in case we cannot use SIMD instructions directly). To get rid of memory allocation at all (and GC pauses) in the main path, you have to replace ".toString()" (it creates new instance of String) by something reusable like StringBuilder.

The last thing which may affects overall performance is method of the file reading. And RandomAccessFile is one of the best options we have in Java. Since your encoding seems to be ASCII, just cast byte to char to pass to the JsonParser.

Upvotes: 1

Teocci
Teocci

Reputation: 8895

You can try the Gson library. This library implements a TypeAdapter class that converts Java objects to and from JSON by streaming serialization and deserialization.

The API is efficient and flexible especially for huge files. Here is an example:

public class GsonStream {
    public static void main(String[] args) {
        Gson gson = new Gson();

        try (Reader reader = new FileReader("src/main/resources/customers.json")) {
            Type listType = new TypeToken<List<Customer>>(){}.getType();

            // Convert JSON File to Java Object
            List<Customer> customers = gson.fromJson(reader, listType);

            List<Customer> names = customers
              .stream()
              .filter(c -> c.getId() == id)
              .map(Customer::getName)
              .collect(Collectors.toList());

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

If you want to understand how to Override the TypeAdapter abstract class here you have and example:

public class GsonTypeAdapter { 
    public static void main(String args[]) { 

        GsonBuilder builder = new GsonBuilder(); 
        builder.registerTypeAdapter(Customer.class, new customerAdapter()); 
        builder.setPrettyPrinting(); 
        Gson gson = builder.create();  

        try {
            reader = new JsonReader(new FileReader("src/main/resources/customers.json"));

            Customer customer = gson.fromJson(jsonString, Customer.class); 
            System.out.println(customer);  

            jsonString = gson.toJson(customer); 
            System.out.println(jsonString);  
        } catch (IOException e) {
            e.printStackTrace();
        }
    }      
}  

class customerAdapter extends TypeAdapter<Customer> { 
   @Override 
   public customer read(JsonReader reader) throws IOException { 
      Customer customer = new customer(); 
      reader.beginObject(); 
      String fieldName = null; 

      while (reader.hasNext()) { 
         JsonToken token = reader.peek();            

         if (token.equals(JsonToken.NAME)) {     
            //get the current token 
            fieldName = reader.nextName(); 
         } 

         if ("name".equals(fieldName)) {       
            //move to next token 
            token = reader.peek(); 
            customer.setName(reader.nextString()); 
         } 

         if("id".equals(fieldName)) { 
            //move to next token 
            token = reader.peek(); 
            customer.setRollNo(reader.nextInt()); 
         }               
      } 
      reader.endObject(); 
      return customer; 
   }  

   @Override 
   public void write(JsonWriter writer, Customer customer) throws IOException { 
      writer.beginObject(); 
      writer.name("name"); 
      writer.value(customer.getName()); 
      writer.name("id"); 
      writer.value(customer.getId()); 
      writer.endObject(); 
   } 
}  

class Customer { 
   private int id; 
   private String name;  

   public int getId() { 
      return id; 
   } 

   public void setId(int id) { 
      this.id = id; 
   }  

   public String getName() { 
      return name; 
   }  

   public void setName(String name) { 
      this.name = name; 
   }   

   public String toString() { 
      return "Customer[ name = " + name + ", id: " + id + "]"; 
   } 
}

Upvotes: 0

prunge
prunge

Reputation: 23248

It should be possible to do this with Jackson. The trick is to use JsonParser to stream/parse the top-level array and then parse each record using ObjectMapper.readValue().

ObjectMapper objectMapper = new ObjectMapper();
File file = new File("customers.json");

try (JsonParser parser = objectMapper.getFactory().createParser(file))
{
    //Assuming top-level array
    if (parser.nextToken() != JsonToken.START_ARRAY)
        throw new RuntimeException("Expected top-level array in JSON.");

    //Now inside the array, parse each record
    while (parser.nextToken() != JsonToken.END_ARRAY)
    {
        Customer customer = objectMapper.readValue(parser, Customer.class);

        //Do something with each customer as it is parsed
        System.out.println(customer.id + ": " + customer.name);
    }
}
@JsonIgnoreProperties(ignoreUnknown = true)
public static class Customer
{
    public String id;
    public String name;
    public String email;
}

In terms of time efficiency it will need to still scan the entire file - not much you can do about that without an index or something fancier like parallel parsing. But it will be more memory efficient than reading the entire JSON into memory - this code only loads one Customer object at a time.


Also:

if(customer.getId() == id) {

Use .equals() for comparing strings, not ==:

if (customer.getId().equals(id)) {

Upvotes: 0

Related Questions