Reputation: 687
Context - On daily basis application log files are generated for external REST calls. I am trying to parse the log files and extract information related to external REST call. The information extracted are StartTime of Rest Call, EndTime when response was received, payloads etc. This information is collected and collated for a particular session in a waterfall diagram so as to visualize how much time is spent in each call.
Problem - For one of the log files, the collated information for a particular session is more than 16MB and hence I have to use GridFS. The REST call information is collected in a custom object. I serialised the object and uploaded to GridFS. On Jupyter Notebook, I am extracting the data, however the data has some additional characters. In the below example, I have converted the object to a string (idea was to use json.loads in Python), but when I extract in Python, I get the following.
Extraction:
import javaobj.v2 as javaobj
pObj = javaobj.loads(fs.get(record['calls']).read())
pObj.dump()
The output of dump instruction is:
'[String 7e0000: \'[{"tokenId":"a6fc52b3-0c60-4332-b866-09f3fe8b243f","responseTime":540.2587,"requestPayload":null,"responsePayload":"{\\\\"Identifiers\\\\":[{\\\\"Name\\\\":\\\\"XXXX MANAGEMENT LLC-CISA\\\\",\\\\"Number\\\\":\\\\"0012345\\\\",\\\\"SubNumber\\\\":\\\\"00099\\\\",\\\\"Sourceode\\\\":null,\\\\"SourceGroups\\\\":null,\\\\"IsPrimary\\\\":false}],\\\\"serviceConsumerId\\\\":\\\\"YYY\\\\",\\\\"serviceConsumerSystemId\\\\":\\\\"001\\\\",\\\\"service ....
The string has too many slash as escape character, then it starts with text "String 7e0000". Obviously json.loads in Python will fail.
Below is the code that I used to serialize the object and store:
Method to convert the custom object to String
public static String convertToString(ArrayList<LogStructure> logs){
StringBuilder sBuilder = new StringBuilder();
sBuilder.append("[");
for (LogStructure log: logs){
sBuilder.append(getString(log));
sBuilder.append(",");
}
return sBuilder.toString();
}
public static String getString(LogStructure log){
StringBuilder sBuilder = new StringBuilder();
sBuilder.append("{logLevel:");
sBuilder.append(log.logLevel);
sBuilder.append(",logInjestionDateTime:");
sBuilder.append(log.logInjestionDateTime);
sBuilder.append(",serverName:");
sBuilder.append(log.serverName);
sBuilder.append(",serviceName:");
sBuilder.append(log.serviceName);
sBuilder.append(",serviceEndPoint:");
sBuilder.append(log.serviceEndPoint);
sBuilder.append(",startTime:");
sBuilder.append(log.startTime);
sBuilder.append(",endTime:");
sBuilder.append(log.endTime);
sBuilder.append(",responseTime:");
sBuilder.append(log.responseTime);
sBuilder.append(",message:");
sBuilder.append(log.message);
sBuilder.append(",endTime:");
sBuilder.append(log.endTime);
sBuilder.append(",requestId:");
sBuilder.append(log.requestId);
sBuilder.append(",tokenId:");
sBuilder.append(log.tokenId);
sBuilder.append(",userId:");
sBuilder.append(log.userId);
sBuilder.append(",requestPayload:");
sBuilder.append(log.requestPayload);
sBuilder.append(",responsePayload:");
sBuilder.append(log.responsePayload);
sBuilder.append(",payloadType:");
sBuilder.append(log.payloadType);
sBuilder.append("}");
return sBuilder.toString();
}
Upload to GridFs in App.java
JSONArray list = new JSONArray();
for(LogStructure temp: tempObject){
totalCalls++;
JSONObject obj = new JSONObject();
//BasicDBObject basicDBObject = new BasicDBObject();
obj.put("logLevel", temp.logLevel);
obj.put("logInjestionDateTime", temp.logInjestionDateTime);
obj.put("serverName", temp.serverName);
obj.put("serviceName", temp.serviceName);
obj.put("serviceEndPoint", temp.serviceEndPoint);
obj.put("startTime", temp.startTime);
obj.put("endTime", temp.endTime);
obj.put("responseTime", temp.responseTime);
obj.put("message", temp.message);
obj.put("requestId", temp.requestId);
obj.put("tokenId", temp.tokenId);
obj.put("requestPayload", temp.requestPayload);
obj.put("responsePayload", temp.responsePayload);
obj.put("payloadType", temp.payloadType);
list.add(obj);
}
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos;
try {
oos = new ObjectOutputStream(baos);
oos.writeObject(list.toJSONString());
oos.flush();
oos.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
InputStream is = new ByteArrayInputStream(baos.toByteArray());
// ObjectOutputStream is = new ObjectOutputStream(null);
GridFSBucket gridFSBucket = mongo.getGridFsBucket(mDb, "Logs");
ObjectId objectId = mongo.uploadToGrid(is, gridFSBucket);
doc.append("totalCalls", totalCalls);
doc.append("calls", objectId);
mongo.insertDocument(mCollection, doc);
I referred to the below links to understand about GridFs and serialization of object.
https://mongodb.github.io/mongo-java-driver/3.5/driver/tutorials/gridfs/ https://howtodoinjava.com/java/serialization/custom-serialization-readobject-writeobject/ https://www.baeldung.com/java-serialization
Currently, the way Java is storing in MongoDB and extracting it, has some specific texts related to Java. For example, the first attempt was to serialize the ArrayList object as it is, however when I extracted the data in python, the string had "System.*.ArrayList". Then I tried storing the data as JsonObject using simple.json library. Default serialization had object's memory address. Then I used toJsonString method and tried to save as string expecting that the final string will be of pattern
"{"key": [{"key1":value},{"key2":value}]}"
This will allow me to use json.loads function and get a dictionary.
Upvotes: 0
Views: 195