kPow989
kPow989

Reputation: 426

Use pandas.read_csv to convert comma seperate string list into dataframe

How can I use Pandas read_csv to convert a big list quickly into a dataframe?

import Pandas as pd

x = '1,2,3,4,5,7,8,9'
df = pd.read_csv(x)

I know that I could split the string by comma -> put into a list -> convert to dataframe, but was wondering was there a way to do this with pd.read_csv that would be faster?

Upvotes: 2

Views: 961

Answers (1)

piRSquared
piRSquared

Reputation: 294218

x = '1,2,3,4,5,7,8,9'
df = pd.read_csv(pd.io.common.StringIO(x), header=None)

df

   0  1  2  3  4  5  7  8
0  1  2  3  4  5  7  8  9

Is the best you can do with pd.read_csv


Consider the much larger string

y = '\n'.join([','.join(['0,1,2,3,4,5,6,7,8,9'] * 100)] * 1000)

And compare timing of these two options

%timeit pd.DataFrame([l.split(',') for l in y.split('\n')]).astype(int)
%timeit pd.read_csv(pd.io.common.StringIO(y), header=None)

1 loop, best of 3: 200 ms per loop
10 loops, best of 3: 125 ms per loop

If all we needed to do is split the string, split would be faster. However, one of the things pd.read_csv does for us is parse integers. That extra overhead gets expensive when having to do it after the split.

Upvotes: 4

Related Questions