Reputation: 153
I saved a pandas dataframe that looks like the following as a csv file.
a
0 {'word': 5.7}
1 {'khfds': 8.34}
When I attempt to read the dataframe as shown below, I receive the following error.
df = pd.read_csv('foo.csv', index_col=0, dtype={'str': 'dict'})
TypeError: data type "dict" not understood
The heart of my question is how do I read the csv file to recover the dataframe in the same form as when it was created. I also have tried reading without the dtype={} as well as replacing 'dict' with alternatives such as 'dictionary', 'object', and 'str'.
Upvotes: 10
Views: 7569
Reputation: 1
(I don't have enough reputation to comment) Even after giving ast.literal_eval I had the "ValueError: malformed node or string" on some dict columns.
Fixing the spacing in dict, fixed the issue for me. example -
before
ast.literal_eval("{'word' : 5.7}, {'khfds' : 8.34}")
after
ast.literal_eval("{'word': 5.7}, {'khfds': 8.34}")
hope this helps someone
Upvotes: 0
Reputation: 360
You can also do the conversion to dictionary directly while reading the csv files as follows:
import pandas as pd
from ast import literal_eval
from io import StringIO
mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")
df = pd.read_csv(mystr, converters={'a': literal_eval})
print(df.iloc[0]['a']['word'])
Upvotes: 4
Reputation: 1579
You may also use the plain and simple python eval as follows:
import pandas as pd
from io import StringIO
mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")
df = pd.read_csv(mystr)
df['a'] = df['a'].apply(eval)
print(df['a'].apply(lambda x: type(x)))
0 <class 'dict'>
1 <class 'dict'>
Name: a, dtype: object
Upvotes: -3
Reputation: 164663
CSV files may only contain text, so dictionaries are out of scope. Therefore, you need to read the text literally to convert to dict
. One way is using ast.literal_eval
:
import pandas as pd
from ast import literal_eval
from io import StringIO
mystr = StringIO("""a
{'word': 5.7}
{'khfds': 8.34}""")
df = pd.read_csv(mystr)
df['a'] = df['a'].apply(literal_eval)
print(df['a'].apply(lambda x: type(x)))
0 <class 'dict'>
1 <class 'dict'>
Name: a, dtype: object
However, I strongly recommend you do not use Pandas specifically to store pointers to dictionaries. Pandas works best with contiguous memory blocks, e.g. separate numeric data into numeric series.
Upvotes: 5