mapping dataframes not series pandas

Question

I am new to pandas, and I am trying to map multiple columns as opposed to just one. This page shows me how to do it with a pd.Series, but I cannot figure out how to map multiple columns.

Here is my two DataFrames I am trying to map.

data2=pd.DataFrame(np.random.randn(5,2),index=range(0,5),columns=['x','y'])
data2['Cluster']=['A','B','A','B','C']
centers2=pd.DataFrame(np.random.randint(0,10,size=(3,2)),index=  ['A','B','C'],columns=['x','y'])

Here data2 looks like:

data2

   x         y              Cluster
0  0.151212 -0.168855       A
1 -0.078935  1.933378       B
2 -0.388903  0.444610       A
3  0.622089  1.609730       B
4 -0.346856  1.095834       C

and centers2 looks like:

centers2
   x  y
A  6  4
B  6  0
C  4  1

I wish to create two seperate columns in data2, with the appropriate center2 matching. Here is my manual attempt

data2['Centers.x']=[6,6,6,6,4]
data2['Centers.y']=[4,0,4,0,1]
data2
          x         y Cluster  Centers.x  Centers.y
0  0.151212 -0.168855       A          6          4
1 -0.078935  1.933378       B          6          0
2 -0.388903  0.444610       A          6          4
3  0.622089  1.609730       B          6          0
4 -0.346856  1.095834       C          4          1

How can I do this with the map function? (I know how to do this using loops, I need a vectorized solution.)

Stefan · Accepted Answer

.merge() comes closest to pd.Series.map() for pd.DataFrame. You can add a custom header to overlapping columns using the suffixes=[] keyword, for instance suffices=['', '_centers'].

Note pd.Series doesn't have .merge(), and pd.DataFrame doesn't have a .map().

With

data2
          x         y Cluster
0 -1.406449 -0.244859       A
1  1.002103  0.214346       B
2  0.353894  0.353995       A
3  1.249199 -0.661904       B
4  0.623962 -1.754789       C

centers2
   x  y
A  0  9
B  6  9
C  0  6

You get:

data2.merge(centers2, left_on='Cluster', right_index=True, suffixes=['', '_centers']).sort_index()

          x         y Cluster  x_centers  y_centers
0 -1.406449 -0.244859       A          0          9
1  1.002103  0.214346       B          6          9
2  0.353894  0.353995       A          0          9
3  1.249199 -0.661904       B          6          9
4  0.623962 -1.754789       C          0          6

There is also the .join() option, which is another way to access .merge(), or pd.concat() if .merge() is on index for both DataFrame - from the source:

def join(self, other, on=None, how='left', lsuffix='', rsuffix='',
         sort=False):
    return self._join_compat(other, on=on, how=how, lsuffix=lsuffix,
                             rsuffix=rsuffix, sort=sort)

def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='',
                 sort=False):
    from pandas.tools.merge import merge, concat

    if isinstance(other, Series):
        if other.name is None:
            raise ValueError('Other Series must have a name')
        other = DataFrame({other.name: other})

    if isinstance(other, DataFrame):
        return merge(self, other, left_on=on, how=how,
                     left_index=on is None, right_index=True,
                     suffixes=(lsuffix, rsuffix), sort=sort)
    else:
        if on is not None:
            raise ValueError('Joining multiple DataFrames only supported'
                             ' for joining on index')

mapping dataframes not series pandas

Answers (2)

Related Questions