Reputation: 426
How can I use Pandas read_csv to convert a big list quickly into a dataframe?
import Pandas as pd
x = '1,2,3,4,5,7,8,9'
df = pd.read_csv(x)
I know that I could split the string by comma -> put into a list -> convert to dataframe, but was wondering was there a way to do this with pd.read_csv that would be faster?
Upvotes: 2
Views: 961
Reputation: 294218
x = '1,2,3,4,5,7,8,9'
df = pd.read_csv(pd.io.common.StringIO(x), header=None)
df
0 1 2 3 4 5 7 8
0 1 2 3 4 5 7 8 9
Is the best you can do with pd.read_csv
Consider the much larger string
y = '\n'.join([','.join(['0,1,2,3,4,5,6,7,8,9'] * 100)] * 1000)
And compare timing of these two options
%timeit pd.DataFrame([l.split(',') for l in y.split('\n')]).astype(int)
%timeit pd.read_csv(pd.io.common.StringIO(y), header=None)
1 loop, best of 3: 200 ms per loop
10 loops, best of 3: 125 ms per loop
If all we needed to do is split the string, split
would be faster. However, one of the things pd.read_csv
does for us is parse integers. That extra overhead gets expensive when having to do it after the split.
Upvotes: 4