Reputation: 30404
I thought I understood map vs applymap pretty well, but am having a problem (see here for additional background, if interested).
A simple example:
df = pd.DataFrame( [[1,2],[1,1]] )
dct = { 1:'python', 2:'gator' }
df[0].map( lambda x: x+90 )
df.applymap( lambda x: x+90 )
That works as expected -- both operate on an elementwise basis, map on a series, applymap on a dataframe (explained very well here btw).
If I use a dictionary rather than a lambda, map still works fine:
df[0].map( dct )
0 python
1 python
but not applymap:
df.applymap( dct )
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-100-7872ff604851> in <module>()
----> 1 df.applymap( dct )
C:\Users\johne\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in applymap(self, func)
3856 x = lib.map_infer(_values_from_object(x), f)
3857 return lib.map_infer(_values_from_object(x), func)
-> 3858 return self.apply(infer)
3859
3860 #----------------------------------------------------------------------
C:\Users\johne\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
3687 if reduce is None:
3688 reduce = True
-> 3689 return self._apply_standard(f, axis, reduce=reduce)
3690 else:
3691 return self._apply_broadcast(f, axis)
C:\Users\johne\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
3777 try:
3778 for i, v in enumerate(series_gen):
-> 3779 results[i] = func(v)
3780 keys.append(v.name)
3781 except Exception as e:
C:\Users\johne\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in infer(x)
3855 f = com.i8_boxer(x)
3856 x = lib.map_infer(_values_from_object(x), f)
-> 3857 return lib.map_infer(_values_from_object(x), func)
3858 return self.apply(infer)
3859
C:\Users\johne\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\lib.pyd in pandas.lib.map_infer (pandas\lib.c:56990)()
TypeError: ("'dict' object is not callable", u'occurred at index 0')
So, my question is why don't map and applymap work in an analogous manner here? Is it a bug with applymap, or am I doing something wrong?
Edit to add: I have discovered that I can work around this fairly easily with this:
df.applymap( lambda x: dct[x] )
0 1
0 python gator
1 python python
Or better yet via this answer which requires no lambda.
df.applymap( dct.get )
So that is pretty much exactly equivalent, right? Must be something with how applymap parses the syntax and I guess the explicit form of a function/method works better than a dictionary. Anyway, I guess now there is no practical problem remaining here but am still interested in what is going on here if anyone wants to answer.
Upvotes: 10
Views: 5990
Reputation: 320
.applymap() and .map() is true to work element-wise. But .applymap() doesn't take every columns and do .map() on those, but do .apply() on each of those.
So when you call df.applymap(dct): What happend is df[0].apply(dct), not df[0].map(dct)
And here what is the difference between this two Series methods:
.map() accept Series, dict and function (any callable, so methods like dict.get work too) as first argument; as .apply() only accept function(or any callable) as first argument.
.map() contains if statement to figure out if the first argument passed is a dict, a Series or a function and act proprely depending of the input. When you pass a function to .map(), the .map() method do the same things as .apply().
But .apply() don't have those if statements that allow it to deal proprely with dictionnary and Series. It only know how to work with callable.
When you call .apply() or .map() with a function they both end calling lib.map_infer(), who look like acting like the map() function of python (but Im enable to put my hand on the source code so Im not completly sure).
Doing map(dct, df[0]) will give you the same error as df.applymap(dct) and df[0].apply(dct) will also give the same error.
Now, you can ask why using .apply() instead of .map(), if .map() do the same thing when called with a function and can take dict and Series?
Because .apply() can return you a Dataframe if the result of the function you pass to it is a Series.
ser = pandas.Series([1,2,3,4,5], index=range(5))
ser_map = ser.map(lambda x : pandas.Series([x]*5, index=range(5)))
type(ser_map)
pandas.core.series.Series
ser_app = ser.apply(lambda x : pandas.Series([x]*5, index=range(5)))
type(ser_app)
pandas.core.frame.DataFrame
Upvotes: 7