user308827
user308827

Reputation: 21961

convert numpy array from object dtype to float

How do I convert the foll. numpy from object dtype to float:

array(['4,364,541', '2,330,200', '2,107,648', '1,525,711', '1,485,231',
       '1,257,500', '1,098,200', '1,065,106', '962,100', '920,200',
       '124,204', '122,320', '119,742', '116,627', '115,900', '108,400',
       '108,400', '108,000', '103,795', '102,900', '101,845', '100,900',
       '100,626'], dtype=object)

I tried arr.astype(float) but that does not work because of , in each string.

Upvotes: 2

Views: 17426

Answers (4)

hpaulj
hpaulj

Reputation: 231335

Yet another way

np.frompyfunc(lambda x: x.replace(',',''),1,1)(arr).astype(float)

frompyfunc returns an object dtype array, which is fine in this case. Often I've found that it is 2x faster than than a list comprehension, but here it times about the same as @coldspeed's:

np.array([v.replace(',', '') for v in arr], dtype=np.float32)

That may be because we are starting with an object dtype array. Direct iteration on an object dtype is a bit slower than iteration on a list, but faster than iteration on a regular numpy array. Like a list, the elements of the array are pointers to strings, and don't require the 'unboxing' that a string dtype array would.

(and 2 to 3 x faster than the np.char version).

Upvotes: 2

rafaelc
rafaelc

Reputation: 59264

Can also use numpy.core.defchararray.replace()

>>> numpy.core.defchararray.replace(arr, ',','').astype(np.float)

array([4364541., 2330200., 2107648., 1525711., 1485231., 1257500.,
       1098200., 1065106.,  962100.,  920200.,  124204.,  122320.,
        119742.,  116627.,  115900.,  108400.,  108400.,  108000.,
        103795.,  102900.,  101845.,  100900.,  100626.])

Or np.char.replace as noted in comments by Cold. Naturally, this package provides is built for arrays of type numpy.string_ or numpy.unicode_

If object type,

replace(a.astype(np.unicode_), ',','').astype(np.float)

Upvotes: 1

dawg
dawg

Reputation: 103714

Given:

>>> ar
array(['4,364,541', '2,330,200', '2,107,648', '1,525,711', '1,485,231',
       '1,257,500', '1,098,200', '1,065,106', '962,100', '920,200',
       '124,204', '122,320', '119,742', '116,627', '115,900', '108,400',
       '108,400', '108,000', '103,795', '102,900', '101,845', '100,900',
       '100,626'], dtype=object)

You can use filter to remove all non-digit elements and create floats:

>>> np.array(list(map(float, (''.join(filter(lambda c: c.isdigit(), s)) for s in ar))))
array([4364541., 2330200., 2107648., 1525711., 1485231., 1257500.,
       1098200., 1065106.,  962100.,  920200.,  124204.,  122320.,
        119742.,  116627.,  115900.,  108400.,  108400.,  108000.,
        103795.,  102900.,  101845.,  100900.,  100626.])

Upvotes: 1

cs95
cs95

Reputation: 402263

Simple way to do it is remove every comma:

np.array([v.replace(',', '') for v in arr], dtype=np.float32)

If you have pandas, to_numeric is a good option. It gracefully handles any invalid values that may creep in post replacement.

pd.to_numeric([v.replace(',', '') for v in arr], errors='coerce',  downcast='float')

Both methods return a float array as output.

Upvotes: 2

Related Questions