user2517214
user2517214

Reputation: 25

convert list to dataframe in python

I have a text file with column header & data. I am trying to convert this file data into pandas DataFrame.

File:

#Columns: TargetDoc|GRank|LRank|Priority|Loc ID
aaaaa|1|1|Slow|8gkahinka.01
aaaaa|1|0|Slow|7nlafnjbaflnbja.01

I wrote below code: Firstly, I converted each line and trying list to convert Dataframe:

import os
import pandas as pd

with open("DocID101_201604070523.txt") as raw_file:
    full_file_text = raw_file.readlines()

raw_file.close()

data_list = list()
for l in full_file_text:
    if i.startswith('#'):
        labels = l.strip().replace('#Columns: ','').split('|')
    else:
        data_list += l.strip().split('|')

df = PD.DataFrame.from_records(data_list,columns=labels)

But I got error on df:

AssertionError: 5 columns passed, passed data had 10 columns.

What's wrong with my code or is there any better way convert to dataframe ?

Upvotes: 2

Views: 3642

Answers (2)

EdChum
EdChum

Reputation: 393863

You can just read in the file using read_csv with sep='|' and then fix the first column name as a post processing step using rename:

In [228]:
import io
import pandas as pd    
t="""#Columns: TargetDoc|GRank|LRank|Priority|Loc ID
aaaaa|1|1|Slow|8gkahinka.01
aaaaa|1|0|Slow|7nlafnjbaflnbja.01"""
df = pd.read_csv(io.StringIO(t), sep='|')
df

Out[228]:
  #Columns: TargetDoc  GRank  LRank Priority              Loc ID
0               aaaaa      1      1     Slow        8gkahinka.01
1               aaaaa      1      0     Slow  7nlafnjbaflnbja.01

Now rename the first column by passing in the first column name as the key for the passed in dict and split the string for the new column name:

In [229]:
df.rename(columns={df.columns[0]:df.columns[0].split()[-1]}, inplace=True)
df

Out[229]:
  TargetDoc  GRank  LRank Priority              Loc ID
0     aaaaa      1      1     Slow        8gkahinka.01
1     aaaaa      1      0     Slow  7nlafnjbaflnbja.01

So in your case:

df = pd.read_csv("DocID101_201604070523.txt", sep='|')

and then rename like the above

Upvotes: 3

iFlo
iFlo

Reputation: 1484

That's because your are contataining all row into one list with :

data_list += l.strip().split('|')

What you want is :

data_list.append(l.strip().split('|'))

This way, you will get a list of list of 5 elements.

Edit : But the solution above of using csv separator is highly recommended.

Upvotes: 1

Related Questions