Reputation: 5130
Say we have a Postgres 12 database that has a table called MyClass
that has a Text column called notes
. Users have the ability to save whatever they wish in this notes field. For the purpose of this question, let's assume that they have somehow bypassed all data sanitation.
Could the following lines of code ever be dangerous due to malicious text in obj.notes?
import pickle
# (obj is a Python3 instance of MyClass using the Django ORM, so obj.notes is always represented as a unicode string)
obj = MyClass.objects.get(id=1)
pickled = pickle.dumps(obj.notes)
unpickled = pickle.loads(pickled)
Upvotes: 2
Views: 423
Reputation: 4890
The python pickle protocol (version 4) serialises strings as: a token, followed by the length of the string, followed by the utf-8 encoded content. The token is a code which marks that the data is to be interpreted as a string (and specifies the data size of the intervening integer). So in theory, all of the encoded string data will be copied directly into the new string object without being parsed (giving no opportunity for the content of the string to influence the behavior of the unpickler machine).
This means even a malicious string should still pickle and unpickle with no change, and has no opportunity to hijack the unpickler machine and run arbitrary code (unlike if the pickled-data itself had been compromised).
import pickle, pickletools
pickletools.dis(pickle.dumps("Hello World"))
For details see pickletools comments.
Earlier (protocol version 0) instead of specifying a fixed length, the protocol used a delimiter to terminate the string, and expected escaping to be applied (in case the same delimiter was intended to also occur within the string). Alternatively, even with the existing protocol, you could re-implement the pickler to perform string compression of repeating sequences. Either way, the safety depends on the implementation of your pickle library being free of bugs.
Upvotes: 2
Reputation: 3895
It is extremely dangerous to use pickle files from other sources outside of your control, as pickles can contain code. That code can be just about anything, including shell commands against your system.
Pickling is also not always safe to do as your describing it- taking ORM classes, pickling them, and then unpickling them can result in the new classes not having the properly links to the database sessions.
In the example you have I would save the ID and use that to reload the object from the database. For other things where I want to move data in and out of an app I would recommend the load_safe
function from pyyaml or the loads
from json (with the default encoder).
Upvotes: -1