user2952698
user2952698

Reputation: 159

Safely extracting partial data from pickled objects

I've got a pickled instance of an object and have to accept these pickled instances from untrusted sources. There is internal state (just an array of integers) that I can use to recreate the instance without executing any code of the pickled object. My question therefore is if it is possible to extract only some data objects from a pickle without executing any code from it.

Upvotes: 1

Views: 2548

Answers (2)

Mike McKerns
Mike McKerns

Reputation: 35217

An idea might be to read the pickled objects from the files as strings, then use pickletools.dis to see what's in them… only allowing a specific list of commands ('STOP', 'INT', …) to be in the second column. That would negate the pickle having any of the types of objects that you are worried about, and if you are only targeting a very specific list of basic python objects, you might be able to do this safely.

Here's what you get with pickletools.dis:

>>> import pickletools
>>> import pickle           
>>> 
>>> p1 = pickle.dumps(1)
>>> p2 = pickle.dumps(min)
>>> 
>>> pickletools.dis(p1)
    0: I    INT        1
    3: .    STOP
highest protocol among opcodes = 0
>>> pickletools.dis(p2)
    0: c    GLOBAL     '__builtin__ min'
   17: p    PUT        0
   20: .    STOP
highest protocol among opcodes = 0
>>> 

It's better than writing a full pickle parser, and possibly doable if you only want to allow simple objects like INTs.

Upvotes: 1

Simon Gibbons
Simon Gibbons

Reputation: 7194

You can do this but only if you parse the data yourself, not relying on pickle which could lead to arbitrary code execution. A very simple example of doing could be

import pickle
import re

class Test(object):
    def __init__(self, l):
        self.internal_list = l
        self.foo = 2
        self.bar = 24

# Create a pickled version of an object
t = Test([1,2,3,4,5,6,7,8,9,10])
pickle.dump(t, open("test.pickle",'w'))

def find_last_integer(s):
    """ Parses a string to return the integer that it ends with
        e.g. find_last_integer("foobar312") == 312
    """
    return int(re.search(r"\d+$", s).group())

# Load the pickled data
data = open("test.pickle").read()
listdata = data[data.find("(lp"):].split('\n') # Assumes that the class will only contain one list
                                               # if you need more then look for all lines starting "(lp"

nelements = find_last_integer(listdata[0])

# Each element of the list should be of the form "In" or "aIn"
reconstructed = [find_last_integer(elem) for elem in listdata[1:nelements+1]]
print reconstructed

Note that I've only tested the above code in python 2.7.8 YMMV if you use it with other versions.

Upvotes: 0

Related Questions