Reputation: 67
The problem is that for my (row, columns) tuple I am getting a strange result: (2,)
This is the first time I am using NumPy and am noticing that the result that I am getting is not at all what I wanted(This is not really what I got as it is long I have just removed some entries):
[['5.9', '3.0', '5.1', '1.8', 'Iris-virginica']
array([['6.2', '3.4', '5.4', '2.3', 'Iris-virginica'],
array([['4.7', '3.2', '1.3', '0.2', 'Iris-setosa'],
array([['4.9', '3.0', '1.4', '0.2', 'Iris-setosa'],...
dtype='|S11')], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], dtype=object)], ...
The result that I wanted looks more like this:
[['5.9', '3.0', '5.1', '1.8', 'Iris-virginica'], ['6.2', '3.4', '5.4', '2.3', 'Iris-virginica'],..
I don't understand exactly what I've done wrong. But I'm pretty sure that if the first error is fixed it will also fix the second error that I am having. Here is my entire code:
import numpy as np
import os
import matplotlib.pyplot as plt
def create_data(input_data_file):
detector = open(input_data_file, "r")
num_lines = sum(1 for line in detector)
detector.close()
infile = open(input_data_file, "r")
line = infile.readline()
elements = line.count(',') + 1
linelist = []
data = np.array([])
#dimensions are num_ lines and the elements
while(line!= ""):
linelist = line.split(',')
actuallist = []
for elem in linelist:
if (elem.count("\n")>0):
elem = elem.rstrip()
actuallist.append(elem)
else:
actuallist.append(elem)
line = infile.readline()
if (data != ([])):
data = np.array((actuallist, data))
else:
data = np.array((actuallist))
infile.close()
print data
print data.shape
return(data)
plot_data(create_data(os.path.expanduser("~/Downloads/iris.txt")))
Upvotes: 0
Views: 473
Reputation: 1573
This is easily solved using numpy's built-in np.loadtxt
function. (docs here)
np.loadtxt('~/Downloads/iris.txt', delimiter=',',
dtype={'names': ('col0', 'col1', 'col2', 'col3', 'col4'),
'formats': (float, float, float, float, '|S30')})
Upvotes: 2
Reputation: 231335
The problem is that you keep creating an array and adding it to a list. It is better if you build the list, and create the array once at the end:
# linelist = [] # doesn't do anything useful
# data = np.array([])
#dimensions are num_ lines and the elements
alist = []
while(line!= ""):
linelist = line.split(',')
actuallist = []
for elem in linelist:
if (elem.count("\n")>0):
elem = elem.rstrip()
actuallist.append(elem)
else:
actuallist.append(elem)
line = infile.readline()
alist.append(actuallist)
data = np.array(alist) # make array from list of lists
It might help to print(alist)
before using it in data
, to make sure it makes sense.
Upvotes: 1