Reputation: 159
I've got a pickled instance of an object and have to accept these pickled instances from untrusted sources. There is internal state (just an array of integers) that I can use to recreate the instance without executing any code of the pickled object. My question therefore is if it is possible to extract only some data objects from a pickle without executing any code from it.
Upvotes: 1
Views: 2548
Reputation: 35217
An idea might be to read the pickled objects from the files as strings, then use pickletools.dis
to see what's in them… only allowing a specific list of commands ('STOP
', 'INT
', …) to be in the second column. That would negate the pickle having any of the types of objects that you are worried about, and if you are only targeting a very specific list of basic python objects, you might be able to do this safely.
Here's what you get with pickletools.dis
:
>>> import pickletools
>>> import pickle
>>>
>>> p1 = pickle.dumps(1)
>>> p2 = pickle.dumps(min)
>>>
>>> pickletools.dis(p1)
0: I INT 1
3: . STOP
highest protocol among opcodes = 0
>>> pickletools.dis(p2)
0: c GLOBAL '__builtin__ min'
17: p PUT 0
20: . STOP
highest protocol among opcodes = 0
>>>
It's better than writing a full pickle parser, and possibly doable if you only want to allow simple objects like INT
s.
Upvotes: 1
Reputation: 7194
You can do this but only if you parse the data yourself, not relying on pickle which could lead to arbitrary code execution. A very simple example of doing could be
import pickle
import re
class Test(object):
def __init__(self, l):
self.internal_list = l
self.foo = 2
self.bar = 24
# Create a pickled version of an object
t = Test([1,2,3,4,5,6,7,8,9,10])
pickle.dump(t, open("test.pickle",'w'))
def find_last_integer(s):
""" Parses a string to return the integer that it ends with
e.g. find_last_integer("foobar312") == 312
"""
return int(re.search(r"\d+$", s).group())
# Load the pickled data
data = open("test.pickle").read()
listdata = data[data.find("(lp"):].split('\n') # Assumes that the class will only contain one list
# if you need more then look for all lines starting "(lp"
nelements = find_last_integer(listdata[0])
# Each element of the list should be of the form "In" or "aIn"
reconstructed = [find_last_integer(elem) for elem in listdata[1:nelements+1]]
print reconstructed
Note that I've only tested the above code in python 2.7.8 YMMV if you use it with other versions.
Upvotes: 0