Reputation: 21
I am trying to parse below JSON file using java. I need to be able to
The search should return entire object. The File will be huge and the search should still be time efficient.
[
{
"id": 1,
"name": "Mark Robb",
"last_login": "2013-01-21T05:13:41 -11:30",
"email": "[email protected]",
"phone": "12345",
"locations": [
"Germany",
"Austria"
]
},
{
"id": 2,
"name": "Matt Nish",
"last_login": "2014-02-21T07:10:41 -11:30",
"email": "[email protected]",
"phone": "456123",
"locations": [
"France",
"Italy"
]
}
]
This is what I have tried so far using Jackson library.
public void findById(int id) {
List<Customer> customers = objectMapper.readValue(new File("src/main/resources/customers.json"), new TypeReference<List<Customer>>(){});
for(Customer customer: customers) {
if(customer.getId() == id) {
System.out.println(customer.getName());
}
}
}
I just don't think this is an efficient method for a huge JSON file(About 20000 customers in a file). And there could be multiple files. Search time should not increase linearly. How can I make this time efficient? Should I use any other library?
Upvotes: 1
Views: 2723
Reputation: 1587
The most efficient (both CPU and memory) way to parse is to use stream oriented parsing instead of object mapping. Usually, it takes a bit more code to be written, but also usually it is a good deal :) Both Gson and Jackson support such lightweight technique. Also, you should avoid memory allocation in the main/hot path to prevent GC pauses. To illustrate the idea I use a small GC-free library https://github.com/anatolygudkov/green-jelly:
import org.green.jelly.*;
import java.io.CharArrayReader;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;
public class SelectById {
public static class Customer {
private long id;
private String name;
private String email;
public void clear() {
id = 0;
name = null;
email = null;
}
public Customer makeCopy() {
Customer result = new Customer();
result.id = id;
result.name = name;
result.email = email;
return result;
}
@Override
public String toString() {
return "Customer{" +
"id=" + id +
", name='" + name + '\'' +
", email='" + email + '\'' +
'}';
}
}
public static void main(String[] args) throws Exception {
final String file = "\n" +
"[\n" +
" {\n" +
" \"id\": 1,\n" +
" \"name\": \"Mark Robb\",\n" +
" \"last_login\": \"2013-01-21T05:13:41 -11:30\",\n" +
" \"email\": \"[email protected]\",\n" +
" \"phone\": \"12345\",\n" +
" \"locations\": [\n" +
" \"Germany\",\n" +
" \"Austria\"\n" +
" ]\n" +
"},\n" +
" {\n" +
" \"id\": 2,\n" +
" \"name\": \"Matt Nish\",\n" +
" \"last_login\": \"2014-02-21T07:10:41 -11:30\",\n" +
" \"email\": \"[email protected]\",\n" +
" \"phone\": \"456123\",\n" +
" \"locations\": [\n" +
" \"France\",\n" +
" \"Italy\"\n" +
" ]\n" +
" }\n" +
"]\n";
final List<Customer> selection = new ArrayList<>();
final long selectionId = 2;
final JsonParser parser = new JsonParser().setListener(
new JsonParserListenerAdaptor() {
private final Customer customer = new Customer();
private String currentField;
@Override
public boolean onObjectStarted() {
customer.clear();
return true;
}
@Override
public boolean onObjectMember(final CharSequence name) {
currentField = name.toString();
return true;
}
@Override
public boolean onStringValue(final CharSequence data) {
switch (currentField) {
case "name":
customer.name = data.toString();
break;
case "email":
customer.email = data.toString();
break;
}
return true;
}
@Override
public boolean onNumberValue(final JsonNumber number) {
if ("id".equals(currentField)) {
customer.id = number.mantissa();
}
return true;
}
@Override
public boolean onObjectEnded() {
if (customer.id == selectionId) {
selection.add(customer.makeCopy());
return false; // we don't need to continue
}
return true;
}
}
);
// now let's read and parse the data with a buffer
final CharArrayCharSequence buffer = new CharArrayCharSequence(1024);
try (final Reader reader = new CharArrayReader(file.toCharArray())) { // replace by FileReader, for example
int len;
while((len = reader.read(buffer.getChars())) != -1) {
buffer.setLength(len);
parser.parse(buffer);
}
}
parser.eoj();
System.out.println(selection);
}
}
It should work almost as fast as possible in Java (in case we cannot use SIMD instructions directly). To get rid of memory allocation at all (and GC pauses) in the main path, you have to replace ".toString()" (it creates new instance of String) by something reusable like StringBuilder.
The last thing which may affects overall performance is method of the file reading. And RandomAccessFile is one of the best options we have in Java. Since your encoding seems to be ASCII, just cast byte to char to pass to the JsonParser.
Upvotes: 1
Reputation: 8895
You can try the Gson library. This library implements a TypeAdapter
class that converts Java objects to and from JSON by streaming serialization and deserialization.
The API is efficient and flexible especially for huge files. Here is an example:
public class GsonStream {
public static void main(String[] args) {
Gson gson = new Gson();
try (Reader reader = new FileReader("src/main/resources/customers.json")) {
Type listType = new TypeToken<List<Customer>>(){}.getType();
// Convert JSON File to Java Object
List<Customer> customers = gson.fromJson(reader, listType);
List<Customer> names = customers
.stream()
.filter(c -> c.getId() == id)
.map(Customer::getName)
.collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
}
}
}
If you want to understand how to Override the TypeAdapter
abstract class here you have and example:
public class GsonTypeAdapter {
public static void main(String args[]) {
GsonBuilder builder = new GsonBuilder();
builder.registerTypeAdapter(Customer.class, new customerAdapter());
builder.setPrettyPrinting();
Gson gson = builder.create();
try {
reader = new JsonReader(new FileReader("src/main/resources/customers.json"));
Customer customer = gson.fromJson(jsonString, Customer.class);
System.out.println(customer);
jsonString = gson.toJson(customer);
System.out.println(jsonString);
} catch (IOException e) {
e.printStackTrace();
}
}
}
class customerAdapter extends TypeAdapter<Customer> {
@Override
public customer read(JsonReader reader) throws IOException {
Customer customer = new customer();
reader.beginObject();
String fieldName = null;
while (reader.hasNext()) {
JsonToken token = reader.peek();
if (token.equals(JsonToken.NAME)) {
//get the current token
fieldName = reader.nextName();
}
if ("name".equals(fieldName)) {
//move to next token
token = reader.peek();
customer.setName(reader.nextString());
}
if("id".equals(fieldName)) {
//move to next token
token = reader.peek();
customer.setRollNo(reader.nextInt());
}
}
reader.endObject();
return customer;
}
@Override
public void write(JsonWriter writer, Customer customer) throws IOException {
writer.beginObject();
writer.name("name");
writer.value(customer.getName());
writer.name("id");
writer.value(customer.getId());
writer.endObject();
}
}
class Customer {
private int id;
private String name;
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String toString() {
return "Customer[ name = " + name + ", id: " + id + "]";
}
}
Upvotes: 0
Reputation: 23248
It should be possible to do this with Jackson. The trick is to use JsonParser
to stream/parse the top-level array and then parse each record using ObjectMapper.readValue()
.
ObjectMapper objectMapper = new ObjectMapper();
File file = new File("customers.json");
try (JsonParser parser = objectMapper.getFactory().createParser(file))
{
//Assuming top-level array
if (parser.nextToken() != JsonToken.START_ARRAY)
throw new RuntimeException("Expected top-level array in JSON.");
//Now inside the array, parse each record
while (parser.nextToken() != JsonToken.END_ARRAY)
{
Customer customer = objectMapper.readValue(parser, Customer.class);
//Do something with each customer as it is parsed
System.out.println(customer.id + ": " + customer.name);
}
}
@JsonIgnoreProperties(ignoreUnknown = true)
public static class Customer
{
public String id;
public String name;
public String email;
}
In terms of time efficiency it will need to still scan the entire file - not much you can do about that without an index or something fancier like parallel parsing. But it will be more memory efficient than reading the entire JSON into memory - this code only loads one Customer
object at a time.
Also:
if(customer.getId() == id) {
Use .equals()
for comparing strings, not ==
:
if (customer.getId().equals(id)) {
Upvotes: 0