Reputation: 183
How to read string column only using numpy in python?
csv file:
1,2,3,"Hello"
3,3,3,"New"
4,5,6,"York"
How to get array like:
["Hello","york","New"]
without using pandas and sklearn.
Upvotes: 0
Views: 5131
Reputation: 99
To extract specific values into the numpy array one approach could be:
with open('Exercise1.csv', 'r') as file:
file_content = list(csv.reader(file, delimiter=","))
data = np.array(file_content)
print(file_content[1][1], len(file_content))
for i in range(1, len(file_content)):
patient.append(file_content[i][0])
first_column_array = np.array(patient, dtype=(''))
i iterates through the rows of data and j is the place of the value in the row, so for 0, the first value
Upvotes: 0
Reputation: 3926
import numpy
fname = 'sample.csv'
csv = numpy.genfromtxt(fname, dtype=str, delimiter=",")
names = csv[:,-1]
print(names)
Choosing the data type The main way to control how the sequences of strings we have read from the file are converted to other types is to set the dtype argument. Acceptable values for this argument are:
a single type, such as dtype=float. The output will be 2D with the given dtype, unless a name has been associated with each column with the use of the names argument (see below). Note that dtype=float is the default for genfromtxt. a sequence of types, such as dtype=(int, float, float). a comma-separated string, such as dtype="i4,f8,|U3". a dictionary with two keys 'names' and 'formats'. a sequence of tuples (name, type), such as dtype=[('A', int), ('B', float)]. an existing numpy.dtype object. the special value None. In that case, the type of the columns will be determined from the data itself (see below).
When dtype=None, the type of each column is determined iteratively from its data. We start by checking whether a string can be converted to a boolean (that is, if the string matches true or false in lower cases); then whether it can be converted to an integer, then to a float, then to a complex and eventually to a string. This behavior may be changed by modifying the default mapper of the StringConverter class.
The option dtype=None is provided for convenience. However, it is significantly slower than setting the dtype explicitly.
Upvotes: 1
Reputation: 231625
A quick file substitute:
In [275]: txt = b'''
...: 1,2,3,"Hello"
...: 3,3,3,"New"
...: 4,5,6,"York"'''
In [277]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=None,usecols=3)
Out[277]:
array([b'"Hello"', b'"New"', b'"York"'],
dtype='|S7')
bytestring array in Py3; or a default unicode string dtype:
In [278]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=str,usecols=3)
Out[278]:
array(['"Hello"', '"New"', '"York"'],
dtype='<U7')
Or the whole thing:
In [279]: data=np.genfromtxt(txt.splitlines(), delimiter=',',dtype=None)
In [280]: data
Out[280]:
array([(1, 2, 3, b'"Hello"'), (3, 3, 3, b'"New"'), (4, 5, 6, b'"York"')],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', 'S7')])
select the f3
field:
In [282]: data['f3']
Out[282]:
array([b'"Hello"', b'"New"', b'"York"'],
dtype='|S7')
Speed should be basically the same
Upvotes: 0
Reputation: 323366
I give the column name as a,b,c,d
in csv
import numpy as np
ary=np.genfromtxt(r'yourcsv.csv',delimiter=',',dtype=None)
ary.T[-1]
Out[139]:
array([b'd', b'Hello', b'New', b'York'],
dtype='|S5')
Upvotes: 3