w.F
w.F

Reputation: 21

How can I read a tsv file with irregular line break

how to read data file with rows of different length? I'm trying to load a tsv(tab separated) file, each line should include 19 attributes. But some lines have 4 attributes, and the next line has others. Each record has all 19 attributes, they just have an irregular line break. How to deal with this file? I want to store them in a table as a dataset, then I can use iloc to get a suitable list.

I ran this code, but get the error "Error tokenizing data. C error: Expected 16 fields in line 32770, saw 19"

import numpy as np
import pandas as pd

dataset = pd.read_csv('data1.tsv',sep="\t",header=None)
T = dataset.iloc[:,8].values

the file just like:

(line1)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
(line2)1 2 3 4 
(line3)5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
(line4)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
...

Upvotes: 1

Views: 717

Answers (1)

w.F
w.F

Reputation: 21

I solved this problem, but .. my code is really ugly


    import csv
    with open('data1.tsv',newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter='\t')
    dataset=[]
    tmp=[]
    for row in spamreader:
        if len(row)==19:
            dataset.append(row)
        elif len(row)==4:
            tmp=row
        elif len(row)==15:
            tmp.extend(row)
            dataset.append(tmp)
            tmp=[]

Upvotes: 1

Related Questions