Reputation: 1639
I've spent some time googling and didn't find answer to the simple question: how can I map column of Pandas dataframe in-place? Say, I have the following df:
In [67]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
In [68]: frame
Out[68]:
b d e
Utah -1.240032 1.586191 -1.272617
Ohio -0.161516 -2.169133 0.223268
Texas -1.921675 0.246167 -0.744242
Oregon 0.371843 2.346133 2.083234
And I want to add 1 to each value of b
column. I know that I can do that like that:
In [69]: frame['b'] = frame['b'].map(lambda x: x + 1)
Or like that -- AFAIK there is no difference between map
and apply
in context of Series
(except that map
can also accept dict
or Series
) -- correct me if I'm wrong:
In [71]: frame['b'] = frame['b'].apply(lambda x: x + 1)
But I don't like specifying 'b'
twice. Instead, I would like to do something like that:
frame['b'].map(lambda x: x + 1, inplace=True)
Is it possible?
Upvotes: 20
Views: 22956
Reputation: 4521
You can use add
In [2]: import pandas as pd
In [3]: import numpy as np
In [4]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=
...: ['Utah', 'Ohio', 'Texas', 'Oregon'])
In [5]: frame.head()
Out[5]:
b d e
Utah -1.165332 -0.999244 -0.541742
Ohio -0.319887 0.199094 -0.438669
Texas -1.242524 -0.385092 -0.389616
Oregon 0.331593 0.505496 1.688962
In [6]: frame.b.add(1)
Out[6]:
Utah -0.165332
Ohio 0.680113
Texas -0.242524
Oregon 1.331593
Name: b, dtype: float64
In [7]:
Upvotes: 1
Reputation: 967
frame
Out[6]:
b d e
Utah -0.764764 0.663018 -1.806592
Ohio 0.082226 -0.164653 -0.744252
Texas 0.763119 1.492637 -1.434447
Oregon -0.485245 -0.806335 -0.008397
frame['b'] +=1
frame
Out[8]:
b d e
Utah 0.235236 0.663018 -1.806592
Ohio 1.082226 -0.164653 -0.744252
Texas 1.763119 1.492637 -1.434447
Oregon 0.514755 -0.806335 -0.008397
Edit to add:
If this is an arbitary function, and you really need to apply in place, you can write a thin wrapper around pandas to handle it. Personally I can't imagine a time when it would be that critical that you need to not use the standard implementation (unless perhaps you write a tonne of code and can't be bother to write the extra charecters perhaps??)
from pandas import DataFrame
import numpy as np
class MyWrapper(DataFrame):
def __init__(self, *args, **kwargs):
super(MyWrapper,self).__init__(*args,**kwargs)
def myapply(self,label, func):
self[label]= super(MyWrapper,self).__getitem__(label).apply(func)
df = frame = MyWrapper(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
print df
df.myapply('b', lambda x: x+1)
print df
Gives:
>>
b d e
Utah -0.260549 -0.981025 1.136154
Ohio 0.073732 -0.895937 -0.025134
Texas 0.555507 -1.173679 0.946342
Oregon 1.871728 -0.850992 1.135784
b d e
Utah 0.739451 -0.981025 1.136154
Ohio 1.073732 -0.895937 -0.025134
Texas 1.555507 -1.173679 0.946342
Oregon 2.871728 -0.850992 1.135784
Obviously this is a very minimal example, hopefully which exposes a few methods of interest for you.
Upvotes: 7