Problems importing text file into Python/Pandas

Question

I am trying to load in a really messy text file into Python/Pandas. Here is an example of what the data in the file looks like

('9ebabd77-45f5-409c-b4dd-6db7951521fd','9da3f80c-6bcd-44ae-bbe8-760177fd4dbc','Seattle, WA','2014-08-05 10:06:24','viewed_home_page'),('9ebabd77-45f5-409c-b4dd-6db7951521fd','9da3f80c-6bcd-44ae-bbe8-760177fd4dbc','Seattle, WA','2014-08-05 10:06:36','viewed_search_results'),('41aa8fac-1bd8-4f95-918c-413879ed43f1','bcca257d-68d3-47e6-bc58-52c166f3b27b','Madison, WI','2014-08-16 17:42:31','visit_start')

Here is my code

import pandas as pd
cols=['ID','Visit','Market','Event Time','Event Name']
table=pd.read_table('C:\Users\Desktop\Dump.txt',sep=',', header=None,names=cols,nrows=10)

But when I look at the table, it still does not read correctly.

All of the data is mainly on one row.

unutbu · Accepted Answer

You could use ast.literal_eval to parse the data into a Python tuple of tuples, and then you could call pd.DataFrame on that:

import pandas as pd
import ast

cols=['ID','Visit','Market','Event Time','Event Name']
with open(filename, 'rb') as f:
    data = ast.literal_eval(f.read())
    df = pd.DataFrame(list(data), columns=cols)
    print(df)

yields

                                     ID                                 Visit  \
0  9ebabd77-45f5-409c-b4dd-6db7951521fd  9da3f80c-6bcd-44ae-bbe8-760177fd4dbc   
1  9ebabd77-45f5-409c-b4dd-6db7951521fd  9da3f80c-6bcd-44ae-bbe8-760177fd4dbc   
2  41aa8fac-1bd8-4f95-918c-413879ed43f1  bcca257d-68d3-47e6-bc58-52c166f3b27b   

        Market           Event Time             Event Name  
0  Seattle, WA  2014-08-05 10:06:24       viewed_home_page  
1  Seattle, WA  2014-08-05 10:06:36  viewed_search_results  
2  Madison, WI  2014-08-16 17:42:31            visit_start

Problems importing text file into Python/Pandas

Answers (1)

Related Questions