Reputation: 1050
I am trying to serialise and store objects/data in a database efficiently. The object can take any form but in most cases it will be of a class that has a primitive counter-part (such as Integer
). I have written the following methods to marshal and unmarshal:
private String marshall(Object obj) throws IOException {
if (obj instanceof String) {
return (String) obj;
} else if ((obj instanceof Integer) || (obj instanceof Byte) || (obj instanceof Short) || (obj instanceof Long) || (obj instanceof Float) || (obj instanceof Double) || (obj instanceof Boolean) || (obj instanceof Character)) {
return obj.toString();
} else {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (ObjectOutputStream oos = new ObjectOutputStream(baos)) {
oos.writeObject(obj);
}
return new String(Base64Coder.encode(baos.toByteArray()));
}
}
private Object unmarshall(String str, Class type) throws IOException, ClassNotFoundException {
if (type.equals(Integer.class)) {
return Integer.parseInt(str);
} else if (type.equals(String.class)) {
return str;
} else if (type.equals(Byte.class)) {
return Byte.parseByte(str);
} else if (type.equals(Short.class)) {
return Short.parseShort(str);
} else if (type.equals(Long.class)) {
return Long.parseLong(str);
} else if (type.equals(Float.class)) {
return Float.parseFloat(str);
} else if (type.equals(Double.class)) {
return Double.parseDouble(str);
} else if (type.equals(Boolean.class)) {
return Boolean.parseBoolean(str);
} else if (type.equals(Character.class)) {
return str.toCharArray()[0];
} else {
byte[] data = Base64Coder.decode(str);
Object o;
try (ObjectInputStream ois = new ObjectInputStream(
new ByteArrayInputStream(data))) {
o = ois.readObject();
}
return o;
}
}
These methods work fine (or at least my JUnit test seems to think they do) but I am just wondering what the best way to store the outputted values would be. The two options I see are LONGTEXT or BLOB. I can see some advantages for both. From what I have researched both will have a maximum length of 4GB - 1B. BLOBS are not searchable but store the data passed to them byte for byte (which may or may not be advantageous - I am not sure). LONGTEXT is searchable and if I can change the encoding from UTF-8 to something closer to BASE64 (if you know what encoding would be best, let me know) then it might be more space efficient than the BLOB encoding (which at the moment is encoded as UTF-8 and reversible via CONVERT(value USING utf8)
.
Another option I saw was storing both by having a rawValue
which just uses the .toString
method on whatever is being stored and value
which is a BLOB or even LONGTEXT if appropriate. This would provide searchable data in the rawValue
and an object representation in the value
. I am not sure if this would be something beneficial to do in the long run, but it would make it easier for third-parties to access the database and to read the data from other languages such as PHP.
I am willing to take on suggestions for completely different approaches if you feel you have something better for this scenario.
Upvotes: 0
Views: 1479
Reputation: 310957
The most space-efficient method is BLOB, without the conversion you mention. Serialized data is binary and I don't see much vale in being able to search it. Base64 etc are not more space-efficient.
Upvotes: 1