Peadar Coyle
Peadar Coyle

Reputation: 2243

Converting millions and billion to a number or integer in a Python column

I have a column that has values like '10 million' and '5 billion' and would like an easy way to just convert that into a numerical value to do some further analysis. I tried

powers = {'billion': 10 ** 9, 'million': 10 ** 6}

def f(s):
    try:
        power = s[-1]
        return float(s[:-1]) * powers[power]
    except TypeError:
        return s

df_2.applymap(f)

Update: My pandas Column consists of 0's (which are the NaN's) and other values which include millions and billions. I hope this is clearer than the previous I used the method recommended by @MobiusKlein below. So here is the useful stack trace error.

    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-1db4b2353170> in <module>()
     10       return float(quantity) * powers[magnitude]
     11 
---> 12 df_2.applymap(f)
     13 

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in applymap(self, func)
   3725                 x = lib.map_infer(_values_from_object(x), f)
   3726             return lib.map_infer(_values_from_object(x), func)
-> 3727         return self.apply(infer)
   3728 
   3729     #----------------------------------------------------------------------

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   3556                     if reduce is None:
   3557                         reduce = True
-> 3558                     return self._apply_standard(f, axis, reduce=reduce)
   3559             else:
   3560                 return self._apply_broadcast(f, axis)

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   3646             try:
   3647                 for i, v in enumerate(series_gen):
-> 3648                     results[i] = func(v)
   3649                     keys.append(v.name)
   3650             except Exception as e:

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in infer(x)
   3724                 f = com.i8_boxer(x)
   3725                 x = lib.map_infer(_values_from_object(x), f)
-> 3726             return lib.map_infer(_values_from_object(x), func)
   3727         return self.apply(infer)
   3728 

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.map_infer (pandas/lib.c:56671)()

<ipython-input-12-1db4b2353170> in f(num_str)
      4 
      5 def f(num_str):
----> 6    match = re.search(r"([0-9\.]+)\s?(million|billion)", num_str)
      7    if match is not None:
      8       quantity = match.group(0)

/home/peadarcoyle/.virtualenvs/Ipython/lib/python2.7/re.pyc in search(pattern, string, flags)
    140     """Scan through string looking for a match to the pattern, returning
    141     a match object, or None if no match was found."""
--> 142     return _compile(pattern, flags).search(string)
    143 
    144 def sub(pattern, repl, string, count=0, flags=0):

TypeError: ('expected string or buffer', u'occurred at index Intended_Investment')

Upvotes: 0

Views: 4845

Answers (1)

mobiusklein
mobiusklein

Reputation: 1423

Your function to query the string for a number word doesn't take into account spaces or more than one digit leading. Try something a little more complex:

import re
powers = {'billion': 10 ** 9, 'million': 10 ** 6}

def f(num_str):
   match = re.search(r"([0-9\.]+)\s?(million|billion)", num_str)
   if match is not None:
      quantity = match.group(0)
      magnitude = match.group(1)
      return float(quantity) * powers[magnitude]

This will throw errors if it cannot extract the proper tokens from the string, but it handles white space and irregular quantities of powers. If you're concerned about floating point error, instead, use int() to cast quantity to a numerical type, but make sure you're not dealing with decimals. If you are, you can play games with magnitude to fix that, but it makes the code more complicated than necessary for a first pass.

Upvotes: 1

Related Questions