Reputation: 21961
How do I convert the foll. numpy from object dtype to float:
array(['4,364,541', '2,330,200', '2,107,648', '1,525,711', '1,485,231',
'1,257,500', '1,098,200', '1,065,106', '962,100', '920,200',
'124,204', '122,320', '119,742', '116,627', '115,900', '108,400',
'108,400', '108,000', '103,795', '102,900', '101,845', '100,900',
'100,626'], dtype=object)
I tried arr.astype(float)
but that does not work because of ,
in each string.
Upvotes: 2
Views: 17426
Reputation: 231335
Yet another way
np.frompyfunc(lambda x: x.replace(',',''),1,1)(arr).astype(float)
frompyfunc
returns an object dtype array, which is fine in this case. Often I've found that it is 2x faster than than a list comprehension, but here it times about the same as @coldspeed's
:
np.array([v.replace(',', '') for v in arr], dtype=np.float32)
That may be because we are starting with an object dtype array. Direct iteration on an object dtype is a bit slower than iteration on a list, but faster than iteration on a regular numpy array. Like a list, the elements of the array are pointers to strings, and don't require the 'unboxing' that a string dtype array would.
(and 2 to 3 x faster than the np.char
version).
Upvotes: 2
Reputation: 59264
Can also use numpy.core.defchararray.replace()
>>> numpy.core.defchararray.replace(arr, ',','').astype(np.float)
array([4364541., 2330200., 2107648., 1525711., 1485231., 1257500.,
1098200., 1065106., 962100., 920200., 124204., 122320.,
119742., 116627., 115900., 108400., 108400., 108000.,
103795., 102900., 101845., 100900., 100626.])
Or np.char.replace
as noted in comments by Cold. Naturally, this package provides is built for arrays of type numpy.string_
or numpy.unicode_
If object type,
replace(a.astype(np.unicode_), ',','').astype(np.float)
Upvotes: 1
Reputation: 103714
Given:
>>> ar
array(['4,364,541', '2,330,200', '2,107,648', '1,525,711', '1,485,231',
'1,257,500', '1,098,200', '1,065,106', '962,100', '920,200',
'124,204', '122,320', '119,742', '116,627', '115,900', '108,400',
'108,400', '108,000', '103,795', '102,900', '101,845', '100,900',
'100,626'], dtype=object)
You can use filter
to remove all non-digit elements and create floats:
>>> np.array(list(map(float, (''.join(filter(lambda c: c.isdigit(), s)) for s in ar))))
array([4364541., 2330200., 2107648., 1525711., 1485231., 1257500.,
1098200., 1065106., 962100., 920200., 124204., 122320.,
119742., 116627., 115900., 108400., 108400., 108000.,
103795., 102900., 101845., 100900., 100626.])
Upvotes: 1
Reputation: 402263
Simple way to do it is remove every comma:
np.array([v.replace(',', '') for v in arr], dtype=np.float32)
If you have pandas, to_numeric
is a good option. It gracefully handles any invalid values that may creep in post replacement.
pd.to_numeric([v.replace(',', '') for v in arr], errors='coerce', downcast='float')
Both methods return a float array as output.
Upvotes: 2