How to create a 2d numpy array from a list of tuples

Question

I have a large text file with three elements in each row - user, question, value. I would like to create a 2d numpy array from this data. The data sample is something like this:

114250 3 1
124400 7 4
111304 1 1

Unfortunately I don't know the size of the resulting matrix beforehand and thus cannot initialize it.

I managed to read the data into a list of 3-tuples with this code (converting the arbitrary user ids to linear 1,2,3... representation):

users = dict()
data = list()

for line in fileinput.input( args[0] ):
    tokens = line.split("	")
    tokens = [ t.strip("
").strip("
") for t in tokens ]
    user = tokens[0]
    question = tokens[1]
    response = tokens[2]

    if user in users.keys():
        user_id = users.get( user )     # existing user
    else:
        user_counter = user_counter + 1 # add new user
        users[user] = user_counter
        user_id = user_counter

    data.append( (int(user_id), int(question), int(response)) )

I am not sure how to convert this list of tuples to a 2D numpy array. I would love to know how to do this in pythonic way.

There should be some method which will read every tuple, get user_id and question as column,row and put the response value in that 2D numpy array. For example a tuple like

(10,3,1)

means that I would like to put the value 1 into a 2D matrix row 10, column 3.

Daniel · Accepted Answer

Simply generate the matrix afterwards:

import numpy as np

data = numpy.array(data)
result = numpy.zeros(shape=(data[:,0].max()+1, data[:,1].max()+1), dtype=int)
result[data[:,0], data[:,1]] = data[:,2]

How to create a 2d numpy array from a list of tuples

Answers (2)

Related Questions