Reputation: 1736
I have to convert DataFrames with 1M rows to a dict. The standard pandas method is quite long to run.
import pandas as pd
import numpy as np
df = pd.DataFrame(data={"col": np.ones(100000)})
%time dict = df.to_dict(orient="index").values()
CPU times: user 5.88 s, sys: 81.3 ms, total: 5.96 s
Wall time: 6.23 s
Is there a way to improve the speed of this process?
Upvotes: 1
Views: 2113
Reputation: 402872
If all you need are the values, using orient='records'
drastically improves performance.
In [43]: %timeit df.to_dict('i').values()
1 loop, best of 3: 6.23 s per loop
In [42]: %timeit df.to_dict('r')
1 loop, best of 3: 822 ms per loop
'r'
is an alias for 'records'
.
Also, note the advantage of using r
is that the result is already as a list, while with the former, you'll need to convert the dict_values
to a list
after.
Upvotes: 2