lupin
lupin

Reputation: 183

Reading a particular column in csv file using numpy in python

How to read string column only using numpy in python?

csv file:

1,2,3,"Hello"
3,3,3,"New"
4,5,6,"York"

How to get array like:

["Hello","york","New"]

without using pandas and sklearn.

Upvotes: 0

Views: 5131

Answers (4)

Anonymous
Anonymous

Reputation: 99

To extract specific values into the numpy array one approach could be:

with open('Exercise1.csv', 'r') as file:
    file_content = list(csv.reader(file, delimiter=","))

data = np.array(file_content)
print(file_content[1][1], len(file_content))
for i in range(1, len(file_content)):
    patient.append(file_content[i][0]) 
first_column_array = np.array(patient, dtype=(''))

i iterates through the rows of data and j is the place of the value in the row, so for 0, the first value

Upvotes: 0

Ramineni Ravi Teja
Ramineni Ravi Teja

Reputation: 3926

import numpy 
fname = 'sample.csv'
csv = numpy.genfromtxt(fname, dtype=str, delimiter=",")
names = csv[:,-1]
print(names)

Choosing the data type The main way to control how the sequences of strings we have read from the file are converted to other types is to set the dtype argument. Acceptable values for this argument are:

a single type, such as dtype=float. The output will be 2D with the given dtype, unless a name has been associated with each column with the use of the names argument (see below). Note that dtype=float is the default for genfromtxt. a sequence of types, such as dtype=(int, float, float). a comma-separated string, such as dtype="i4,f8,|U3". a dictionary with two keys 'names' and 'formats'. a sequence of tuples (name, type), such as dtype=[('A', int), ('B', float)]. an existing numpy.dtype object. the special value None. In that case, the type of the columns will be determined from the data itself (see below).

When dtype=None, the type of each column is determined iteratively from its data. We start by checking whether a string can be converted to a boolean (that is, if the string matches true or false in lower cases); then whether it can be converted to an integer, then to a float, then to a complex and eventually to a string. This behavior may be changed by modifying the default mapper of the StringConverter class.

The option dtype=None is provided for convenience. However, it is significantly slower than setting the dtype explicitly.

Upvotes: 1

hpaulj
hpaulj

Reputation: 231625

A quick file substitute:

In [275]: txt = b'''
     ...: 1,2,3,"Hello"
     ...: 3,3,3,"New"
     ...: 4,5,6,"York"'''

In [277]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=None,usecols=3)
Out[277]: 
array([b'"Hello"', b'"New"', b'"York"'],
      dtype='|S7')

bytestring array in Py3; or a default unicode string dtype:

In [278]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=str,usecols=3)
Out[278]: 
array(['"Hello"', '"New"', '"York"'],
      dtype='<U7')

Or the whole thing:

In [279]: data=np.genfromtxt(txt.splitlines(), delimiter=',',dtype=None)
In [280]: data
Out[280]: 
array([(1, 2, 3, b'"Hello"'), (3, 3, 3, b'"New"'), (4, 5, 6, b'"York"')],
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', 'S7')])

select the f3 field:

In [282]: data['f3']
Out[282]: 
array([b'"Hello"', b'"New"', b'"York"'],
      dtype='|S7')

Speed should be basically the same

Upvotes: 0

BENY
BENY

Reputation: 323366

I give the column name as a,b,c,d in csv

import numpy as np
ary=np.genfromtxt(r'yourcsv.csv',delimiter=',',dtype=None)
ary.T[-1]
Out[139]: 
array([b'd', b'Hello', b'New', b'York'],
      dtype='|S5')

Upvotes: 3

Related Questions