Ben
Ben

Reputation: 675

How to read a column of 'dictionary' into pandas

I have a csv file with a column whose type is a dictionary (column 'b' in the example below). But b in df is a string type even though I define it as the dictionary type. I didn't find the solution to this question. Any suggestions?

a = pd.DataFrame({'a': [1,2,3], 'b':[{'a':3, 'haha':4}, {'c':3}, {'d':4}]})
a.to_csv('tmp.csv', index=None)
df = pd.read_csv('tmp.csv', dtype={'b':dict})

Upvotes: 4

Views: 9543

Answers (3)

Uri Barenholz
Uri Barenholz

Reputation: 731

You could use the converters parameter. From the documentation:

converters : dict, optional

Dict of functions for converting values in certain columns. Keys can either be integers or column labels.

If you know your column is well formed and includes no missing values then you could do:

df = pd.read_csv('tmp.csv', converters = {'b': ast.literal_eval})

However, for safety (as others have commented) you should probably define your own function with some basic error resilience:

def to_dict(x):
    try:
        y = ast.literal_eval(x)
        if type(y) == dict:
            return y
    except:
        return None

and then:

df = pd.read_csv('tmp.csv', converters = {'b': to_dict})

Upvotes: 2

MSeifert
MSeifert

Reputation: 152870

There is no dictionary type in pandas. So you should specify object in case you want normal Python objects:

df = pd.read_csv('tmp.csv', dtype={'b':object})

This will contain strings because pandas doesn't know what dictionaries are. In case you want dictionaries again you could try to "eval" them with ast.literal_eval (safe string evaluation):

df['b'] = df['b'].apply(ast.literal_eval)

print(df['b'][0]['a'])  # 3

If you're really confident that you never ever run this on untrusted csvs you could also use eval. But before you consider that I would recommend trying to use a DataFrame only with "native" pandas or NumPy types (or maybe a DataFrame in DataFrame approach). Best to avoid object dtypes as much as possible.

Upvotes: 2

John Zwinck
John Zwinck

Reputation: 249652

I wonder if your CSV column is actually meant to be a Python dict column, or rather JSON. If it's JSON, you can read the column as dtype=str, then use json_normalize() on that column to explode it into multiple columns. This is an efficient solution assuming the column contains valid JSON.

Upvotes: 3

Related Questions