Reputation: 2243
I have a column that has values like '10 million' and '5 billion' and would like an easy way to just convert that into a numerical value to do some further analysis. I tried
powers = {'billion': 10 ** 9, 'million': 10 ** 6}
def f(s):
try:
power = s[-1]
return float(s[:-1]) * powers[power]
except TypeError:
return s
df_2.applymap(f)
Update: My pandas Column consists of 0's (which are the NaN's) and other values which include millions and billions. I hope this is clearer than the previous I used the method recommended by @MobiusKlein below. So here is the useful stack trace error.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-1db4b2353170> in <module>()
10 return float(quantity) * powers[magnitude]
11
---> 12 df_2.applymap(f)
13
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in applymap(self, func)
3725 x = lib.map_infer(_values_from_object(x), f)
3726 return lib.map_infer(_values_from_object(x), func)
-> 3727 return self.apply(infer)
3728
3729 #----------------------------------------------------------------------
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
3556 if reduce is None:
3557 reduce = True
-> 3558 return self._apply_standard(f, axis, reduce=reduce)
3559 else:
3560 return self._apply_broadcast(f, axis)
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
3646 try:
3647 for i, v in enumerate(series_gen):
-> 3648 results[i] = func(v)
3649 keys.append(v.name)
3650 except Exception as e:
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in infer(x)
3724 f = com.i8_boxer(x)
3725 x = lib.map_infer(_values_from_object(x), f)
-> 3726 return lib.map_infer(_values_from_object(x), func)
3727 return self.apply(infer)
3728
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.map_infer (pandas/lib.c:56671)()
<ipython-input-12-1db4b2353170> in f(num_str)
4
5 def f(num_str):
----> 6 match = re.search(r"([0-9\.]+)\s?(million|billion)", num_str)
7 if match is not None:
8 quantity = match.group(0)
/home/peadarcoyle/.virtualenvs/Ipython/lib/python2.7/re.pyc in search(pattern, string, flags)
140 """Scan through string looking for a match to the pattern, returning
141 a match object, or None if no match was found."""
--> 142 return _compile(pattern, flags).search(string)
143
144 def sub(pattern, repl, string, count=0, flags=0):
TypeError: ('expected string or buffer', u'occurred at index Intended_Investment')
Upvotes: 0
Views: 4845
Reputation: 1423
Your function to query the string for a number word doesn't take into account spaces or more than one digit leading. Try something a little more complex:
import re
powers = {'billion': 10 ** 9, 'million': 10 ** 6}
def f(num_str):
match = re.search(r"([0-9\.]+)\s?(million|billion)", num_str)
if match is not None:
quantity = match.group(0)
magnitude = match.group(1)
return float(quantity) * powers[magnitude]
This will throw errors if it cannot extract the proper tokens from the string, but it handles white space and irregular quantities of powers. If you're concerned about floating point error, instead, use int() to cast quantity to a numerical type, but make sure you're not dealing with decimals. If you are, you can play games with magnitude to fix that, but it makes the code more complicated than necessary for a first pass.
Upvotes: 1