Shakti
Shakti

Reputation: 2033

How to parse delimited data in python ( pandas ) to create DataFrame?

I have some raw data in the following format where record delimiter is ~ and element delimiter is |.

date|o|h|l|c|e|f~07-12-2012 09:15|5934.0000|5945.5000|5934.0000|5938.6500|1749606|1749606~07-12-2012 09:16|5939.1000|5941.8000|5936.3500|5941.8000|1064557|2814163

Now I want to parse this data into a pandas data frame, but I guess the format that pandas data frame understands is key - column . So basically i am able to parse this data into three rows using split('~')

date|o|h|l|c|e|f
07-12-2012 09:15|5934.0000|5945.5000|5934.0000|5938.6500|1749606|1749606
07-12-2012 09:16|5939.1000|5941.8000|5936.3500|5941.8000|1064557|2814163

But is there a api using which I can split it on a basis of column so something like as shown below,or is there a api in pandas which I can use directly to feed my data

date - 07-12-2012 09:15,07-12-2012 09:16
o - 5934.0000,5939.1000 
h ..... etc
l
c
e
f 

Upvotes: 0

Views: 2454

Answers (2)

Wes McKinney
Wes McKinney

Reputation: 105491

I'm looking to make this much simpler to do with read_table, i.e.:

df = read_table(path, sep='|', lineterminator='~')

Look out for the next pandas release:

http://github.com/pydata/pandas/issues/2457

EDIT: this is done and works in pandas 0.10

Upvotes: 2

Hyperboreus
Hyperboreus

Reputation: 32429

I am not quite sure if this is what you are looking for:

s = 'date|o|h|l|c|e|f~07-12-2012 09:15|5934.0000|5945.5000|5934.0000|5938.6500|1749606|1749606~07-12-2012 09:16|5939.1000|5941.8000|5936.3500|5941.8000|1064557|2814163'

rows = s.split ('~')
d = {}
keys = rows [0].split ('|')
for key in keys: d [key] = []
for row in rows [1:]:
    for idx, value in enumerate (row.split ('|') ):
        d [keys [idx] ].append (value)

print (d)

Upvotes: 1

Related Questions