Reputation: 2784
I can't figure out how to apply a simple function to every row of a column in a Panda data frame.
Example:
def delLastThree(x):
x = x.strip()
x = x[:-3]
return x
arr = ['test123','test234','test453']
arrDF = pandas.DataFrame(arr)
arrDF.columns = ['colOne']
arrDF['colOne'].apply(delLastThree)
print arrDF
I would expect the code below to return 'test' for every row. Instead it prints the original values.
How do I apply the delLastThree function to every row in the DF?
Upvotes: 0
Views: 601
Reputation: 42885
You are creating a pd.Series
when selecting using single brackets with df['colOne']
.
Either use .apply(func, axis=1)
on a DataFrame
, ie either when selecting with [['colOne']]
, or without selecting any columns. However, if you use .apply(axis=1)
, the result is a pd.Series
, so you need to modify the function to .str
for .string
methods.
With the pd.Series
resulting from selecting with ['colOne']
, you can use either just .apply()
or .map()
.
def delLastThree_series(x):
x = x.strip()
x = x[:-3]
return x
def delLastThree_df(x):
x = x.str.strip()
x = x.str[:-3]
return x
arr = ['test123','test234','test453']
arrDF = pd.DataFrame(arr)
arrDF.columns = ['colOne']
Now use either
arrDF.apply(delLastThree_df, axis=1)
arrDF[['colOne']].apply(delLastThree_df, axis=1)
or
arrDF['colOne'].apply(delLastThree_series)
arrDF['colOne'].map(delLastThree_series, axis=1)
to get:
colOne
0 test
1 test
2 test
You could of course also just:
arrDF['colOne'].str.strip().str[:-3]
Upvotes: 2
Reputation: 210852
use map() function for series (single column):
In [15]: arrDF['colOne'].map(delLastThree)
Out[15]:
0 test
1 test
2 test
Name: colOne, dtype: object
or if you want to change it:
In [16]: arrDF['colOne'] = arrDF['colOne'].map(delLastThree)
In [17]: arrDF
Out[17]:
colOne
0 test
1 test
2 test
but as @Stefan said this will be much faster and more efficient and more "Pandonic":
arrDF['colOne'] = arrDF['colOne'].str.strip().str[:-3]
or if you want to strip all trailing spaces and numbers:
arrDF['colOne'] = arrDF['colOne'].str.replace(r'[\s\d]+$', '')
test:
In [21]: arrDF['colOne'].str.replace(r'[\s\d]+$', '')
Out[21]:
0 test
1 test
2 test
Name: colOne, dtype: object
Upvotes: 1