whh1294
whh1294

Reputation: 63

How to read a data frame in txt.file that doesn't have separator or fixed width with pandas

I'm working on a raw data which is a text file. However, it doesn't have separator or fixed width. Each column has different length. For example, the length of column 1 is 12; the length of column 2 is 5; and so forth.The definition of the file is something like this

I was wondering is there a function from some packages that can handle this kind of file given the length of each column. One way I think that may work is using regular expression to iterate each row and column.

Upvotes: 2

Views: 2142

Answers (2)

VBB
VBB

Reputation: 1325

This is still a fixed width file (that just means size of each field is fixed, it does not have to be equal). So you can use pandas.read_fwf, with the widths argument as [21,5,5,12...] to read this. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_fwf.html

Upvotes: 4

Tony
Tony

Reputation: 1290

The easiest way, assuming there are no separators, would just be to hard code the string slices:

with open("text.txt", "r+") as fh:
  for row in fh:
    row.write(row[0:12]+","+row[12:17]+","+row[17:23]... ) #finish

Then you could just specify the separator when you create the dataframe.

Upvotes: 1

Related Questions