P. Solar
P. Solar

Reputation: 359

Pandas DataFrame from raw string

I've got a string which looks like:

a1\tb1\tc1\na2\tb2\tc2\na3\tb3\tc3\n...

Is there an efficient and smart way to convert this kind of string into a Pandas DataFrame? StringIO seems not to be correct for this approach.

Thanks in advance!!

Upvotes: 4

Views: 3250

Answers (2)

Mabel Villalba
Mabel Villalba

Reputation: 2598

Python 2.7

You just need to specify the delimiter to be sep='\t' and also put the string to unicode to avoid errors:

 pd.read_csv(io.StringIO(u'a1\tb1\tc1\na2\tb2\tc2\na3\tb3\tc3'), 
             sep="\t", header=None)
    0   1   2
0  a1  b1  c1
1  a2  b2  c2
2  a3  b3  c3

Upvotes: 3

cs95
cs95

Reputation: 402593

StringIO works perfectly.

import io

string = 'a1\tb1\tc1\na2\tb2\tc2\na3\tb3\tc3'
pd.read_csv(io.StringIO(string), delim_whitespace=True, header=None)

    0   1   2
0  a1  b1  c1
1  a2  b2  c2
2  a3  b3  c3

You can also use pd.read_table or pd.read_fwf in the same manner:

pd.read_table(io.StringIO(string), header=None)

Or,

pd.read_fwf(io.StringIO(string), header=None)

    0   1   2
0  a1  b1  c1
1  a2  b2  c2
2  a3  b3  c3

In these last two examples, it is assumed that whitespace is the natural delimiter. However, your raw string must maintain a consistent structure within data.


Finally, you can also use a string splitting approach, splitting on newlines first, and then on tabs:

pd.DataFrame(list(map(str.split, string.splitlines())))

    0   1   2
0  a1  b1  c1
1  a2  b2  c2
2  a3  b3  c3

Upvotes: 8

Related Questions