Coder
Coder

Reputation: 67

Using NumPy to read value from a file and place into 2_D array but getting odd results on columns and rows

The problem is that for my (row, columns) tuple I am getting a strange result: (2,)

This is the first time I am using NumPy and am noticing that the result that I am getting is not at all what I wanted(This is not really what I got as it is long I have just removed some entries):

[['5.9', '3.0', '5.1', '1.8', 'Iris-virginica']
 array([['6.2', '3.4', '5.4', '2.3', 'Iris-virginica'],
       array([['4.7', '3.2', '1.3', '0.2', 'Iris-setosa'],
       array([['4.9', '3.0', '1.4', '0.2', 'Iris-setosa'],...
      dtype='|S11')], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], ...

The result that I wanted looks more like this:

[['5.9', '3.0', '5.1', '1.8', 'Iris-virginica'], ['6.2', '3.4', '5.4', '2.3', 'Iris-virginica'],..

I don't understand exactly what I've done wrong. But I'm pretty sure that if the first error is fixed it will also fix the second error that I am having. Here is my entire code:

import numpy as np
import os
import matplotlib.pyplot as plt

def create_data(input_data_file):
    detector  = open(input_data_file, "r")
    num_lines = sum(1 for line in detector)
    detector.close()

    infile = open(input_data_file, "r")
    line = infile.readline()
    elements = line.count(',') + 1
    linelist = []
    data = np.array([])
    #dimensions are num_ lines and the elements
    while(line!= ""):
        linelist = line.split(',')
        actuallist = []
        for elem in linelist:
            if (elem.count("\n")>0):
                elem = elem.rstrip()
                actuallist.append(elem)
            else:
                actuallist.append(elem)
        line = infile.readline()
        if (data != ([])):
            data = np.array((actuallist, data))
        else:
            data = np.array((actuallist))
    infile.close()
    print data
    print data.shape
    return(data)

plot_data(create_data(os.path.expanduser("~/Downloads/iris.txt")))

Upvotes: 0

Views: 473

Answers (2)

Logan Byers
Logan Byers

Reputation: 1573

This is easily solved using numpy's built-in np.loadtxtfunction. (docs here)

np.loadtxt('~/Downloads/iris.txt', delimiter=',',
           dtype={'names': ('col0', 'col1', 'col2', 'col3', 'col4'),
                  'formats': (float, float, float, float, '|S30')})

Upvotes: 2

hpaulj
hpaulj

Reputation: 231335

The problem is that you keep creating an array and adding it to a list. It is better if you build the list, and create the array once at the end:

# linelist = []   # doesn't do anything useful
# data = np.array([])
#dimensions are num_ lines and the elements
alist = []
while(line!= ""):
    linelist = line.split(',')
    actuallist = []
    for elem in linelist:
        if (elem.count("\n")>0):
            elem = elem.rstrip()
            actuallist.append(elem)
        else:
            actuallist.append(elem)
    line = infile.readline()
    alist.append(actuallist)
data = np.array(alist)       # make array from list of lists

It might help to print(alist) before using it in data, to make sure it makes sense.

Upvotes: 1

Related Questions