Reading data file with unequal number of columns using numpy

Question

I have a .dat file with numbers. In the first row, this file has five columns, and in all subsequent rows, it has four columns. I want to be able to read this file using numpy. I encounter the following error when I try to read this file at present:

In [3]: F1 = np.loadtxt(‘file.dat')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent 
call last)
 in ()
----> 1 F1 = np.loadtxt(‘file.dat')

/Users/usr/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding)
   1090         # converting the data
   1091         X = None
-> 1092         for x in read_data(_loadtxt_chunksize):
   1093             if X is None:
   1094                 X = np.array(x, dtype)

/Users/usr/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.pyc in read_data(chunk_size)
   1014                 line_num = i + skiprows + 1
   1015                 raise ValueError("Wrong number of columns at line %d"
-> 1016                                  % line_num)
   1017 
   1018             # Convert each value according to its column and store

ValueError: Wrong number of columns at line 2

How can I read all the rows of the file except the first row using python? I have attached an example file here.

Additionally, the first column of this file (minus the first row) has n^2 number of rows (in the example I have n=3 and the entries of the column are 1,2,3,4,5,6,7,8,9). I want to read the first column (minus the first row) and save it as a text file where of shape (n,n) (i.e. the text file should have n rows and n columns). That is to say, I want the saved matrix to have the entries in the following order:

1.0 2.0 3.0
4.0 5.0 6.0
7.0 8.0 9.0

I will be thankful to have help.

Zheng Liu · Accepted Answer

Some experiments to do: (not optimized) 1. Read in the lines of the file:

edit: The 'file.dat' file has empty lines. The if line.strip()... clause is to deal with the empty lines.

with open('file.dat', 'r') as fhand:
    file_lines = [line[:-1] for line in fhand if line.strip() != ''] # remove the last character '
'. **Remove empty lines**.

If you don't like the first row, drop it.

file_lines.pop(0)

Now that the remaining lines have the same number of columns of numerical entries, you can split entries in each line, and do the type conversion:

mat_raw = [[float(term) for term in line.split()] for line in file_lines]

You then get a float matrix. For convenience in slicing, convert it into numpy array.

mat = numpy.array(mat_raw)
# then you can do whatever you like. eg: first column
first_col = mat[:, 0]
# reshape it to n by n matrix:
res = first_col.reshape((n, n))
...

Depending on the format of your file and your goal, you may optimise this code for your own use.

Reading data file with unequal number of columns using numpy

Answers (1)

Related Questions