Reputation: 675
I have a csv file with a column whose type is a dictionary (column 'b' in the example below). But b in df is a string type even though I define it as the dictionary type. I didn't find the solution to this question. Any suggestions?
a = pd.DataFrame({'a': [1,2,3], 'b':[{'a':3, 'haha':4}, {'c':3}, {'d':4}]})
a.to_csv('tmp.csv', index=None)
df = pd.read_csv('tmp.csv', dtype={'b':dict})
Upvotes: 4
Views: 9543
Reputation: 731
You could use the converters parameter. From the documentation:
converters : dict, optional
Dict of functions for converting values in certain columns. Keys can either be integers or column labels.
If you know your column is well formed and includes no missing values then you could do:
df = pd.read_csv('tmp.csv', converters = {'b': ast.literal_eval})
However, for safety (as others have commented) you should probably define your own function with some basic error resilience:
def to_dict(x):
try:
y = ast.literal_eval(x)
if type(y) == dict:
return y
except:
return None
and then:
df = pd.read_csv('tmp.csv', converters = {'b': to_dict})
Upvotes: 2
Reputation: 152870
There is no dictionary type in pandas. So you should specify object
in case you want normal Python objects:
df = pd.read_csv('tmp.csv', dtype={'b':object})
This will contain strings because pandas doesn't know what dictionaries are. In case you want dictionaries again you could try to "eval" them with ast.literal_eval
(safe string evaluation):
df['b'] = df['b'].apply(ast.literal_eval)
print(df['b'][0]['a']) # 3
If you're really confident that you never ever run this on untrusted csvs you could also use eval
. But before you consider that I would recommend trying to use a DataFrame
only with "native" pandas or NumPy types (or maybe a DataFrame
in DataFrame
approach). Best to avoid object
dtypes as much as possible.
Upvotes: 2
Reputation: 249652
I wonder if your CSV column is actually meant to be a Python dict column, or rather JSON. If it's JSON, you can read the column as dtype=str
, then use json_normalize()
on that column to explode it into multiple columns. This is an efficient solution assuming the column contains valid JSON.
Upvotes: 3