vwrobel
vwrobel

Reputation: 1736

In Python pandas, is there a way to improve the speed of dataframe to dict conversion?

I have to convert DataFrames with 1M rows to a dict. The standard pandas method is quite long to run.

import pandas as pd
import numpy as np

df = pd.DataFrame(data={"col": np.ones(100000)})
%time dict = df.to_dict(orient="index").values()

CPU times: user 5.88 s, sys: 81.3 ms, total: 5.96 s
Wall time: 6.23 s

Is there a way to improve the speed of this process?

Upvotes: 1

Views: 2113

Answers (1)

cs95
cs95

Reputation: 402872

If all you need are the values, using orient='records' drastically improves performance.

In [43]: %timeit df.to_dict('i').values()
1 loop, best of 3: 6.23 s per loop

In [42]: %timeit df.to_dict('r')
1 loop, best of 3: 822 ms per loop

'r' is an alias for 'records'.

Also, note the advantage of using r is that the result is already as a list, while with the former, you'll need to convert the dict_values to a list after.

Upvotes: 2

Related Questions