lyche
lyche

Reputation: 125

Create a list of lists / matrices from a text file in Python 3.4

I need to create a set of matrices from the file below, the lines/rows with the same value of Z will go in a matrix together.

Below is a shortened version of my txt file:

X  Y    Z
-1 10   0
1  20   5
2  15   10
2  50   10
2  90   10
3  15   11
4  50   11
5  90   11
6  13   14
7  50   14
8  70   14
8  95   14
8  75   14

So for example my first matrix will be

 [-1, 10, 0], 

my second one will be

[1, 20, 5], 

my third will be

([2, 15, 10],
 [2, 50, 10],
 [2, 90, 10]) etc

I've looked at a few questions related to this but nothing seems to be quite right.

I started by making each column an array. I was thinking a for loop might work well. So far I have

f = open("data.txt", "r")
header1 = f.readline()
for line in f:
    line = line.strip()
    columns = line.split()
    x = columns[0]
    y = columns[1]
    z = columns[2]
i = line in f
z.old = line(i-1,4)
i=1
for line in f:
    f.readline(i)
    if z(0) == [i,3]:
       line(i) = matrix[i,:]
    else z(0) != [i,3]:
         store line(i) as M
         continue
    i = i+1

however, I'm getting 'invalid syntax' for line,

else z(0) != line(4):

By this else clause, I mean that if z(0)/(z initial) is not equal to line(4) then this line will then get stored as the first line of the next matrix we will check under this code.

However, I'm not sure how well this would work.

Any help would be greatly appreciated!

Upvotes: 0

Views: 857

Answers (2)

Martin Evans
Martin Evans

Reputation: 46759

The following should work for your data, it assumes the columns in your text file are tab delimited:

import csv
import operator

with open('input.txt', 'rb') as f_input:
    csv_input = csv.reader(f_input, delimiter='\t')
    headers = next(csv_input)
    row_number = 1

    for k, g in itertools.groupby(csv_input, key=operator.itemgetter(0)):
        row = []
        for entry in g:
            entry = [float(e) for e in entry]
            row.append([row_number] + entry)
            row_number += 1
        print row

This would print the following output:

[[1, -1, 10, 0]]
[[2, 1, 20, 5]]
[[3, 2, 15, 10], [4, 2, 50, 10], [5, 2, 90, 10]]
[[6, 3, 15, 11]]
[[7, 4, 50, 11]]
[[8, 5, 90, 11]]
[[9, 6, 13, 14]]
[[10, 7, 50, 14]]
[[11, 8, 70, 14], [12, 8, 95, 14], [13, 8, 75, 14]]

If your CSV file is exactly as you have it shown, i.e. with spaces separating the columns, then you will need to change the csv.reader line as follows:

csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True)

Upvotes: 1

luator
luator

Reputation: 5019

The following, much simpler code, will do what you want:

import numpy as np

# Load the file using numpy (skip the first row which contains the header)
foo = np.loadtxt("/path/to/your/data-file", skiprows=1)

# Prepend a column with the row number
first_col = np.arange(foo.shape[0]) + 1  # +1 because we don't want to start with 0
bar = np.hstack((first_col[:, None], foo))

You can now access the single lines via bar[0], bar[1], ...

Upvotes: 1

Related Questions