GavinBelson
GavinBelson

Reputation: 2784

Pandas Apply Syntax

I can't figure out how to apply a simple function to every row of a column in a Panda data frame.

Example:

def delLastThree(x):
    x = x.strip()
    x = x[:-3]
    return x

arr = ['test123','test234','test453']
arrDF = pandas.DataFrame(arr)
arrDF.columns = ['colOne']
arrDF['colOne'].apply(delLastThree)
print arrDF

I would expect the code below to return 'test' for every row. Instead it prints the original values.

How do I apply the delLastThree function to every row in the DF?

Upvotes: 0

Views: 601

Answers (2)

Stefan
Stefan

Reputation: 42885

You are creating a pd.Series when selecting using single brackets with df['colOne'].

Either use .apply(func, axis=1) on a DataFrame, ie either when selecting with [['colOne']], or without selecting any columns. However, if you use .apply(axis=1), the result is a pd.Series, so you need to modify the function to .str for .string methods.

With the pd.Series resulting from selecting with ['colOne'], you can use either just .apply() or .map().

def delLastThree_series(x):
    x = x.strip()
    x = x[:-3]
    return x

def delLastThree_df(x):
    x = x.str.strip()
    x = x.str[:-3]
    return x

arr = ['test123','test234','test453']
arrDF = pd.DataFrame(arr)

arrDF.columns = ['colOne']

Now use either

arrDF.apply(delLastThree_df, axis=1)
arrDF[['colOne']].apply(delLastThree_df, axis=1)

or

arrDF['colOne'].apply(delLastThree_series)
arrDF['colOne'].map(delLastThree_series, axis=1)

to get:

  colOne
0   test
1   test
2   test

You could of course also just:

arrDF['colOne'].str.strip().str[:-3]

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

use map() function for series (single column):

In [15]: arrDF['colOne'].map(delLastThree)
Out[15]:
0    test
1    test
2    test
Name: colOne, dtype: object

or if you want to change it:

In [16]: arrDF['colOne'] = arrDF['colOne'].map(delLastThree)

In [17]: arrDF
Out[17]:
  colOne
0   test
1   test
2   test

but as @Stefan said this will be much faster and more efficient and more "Pandonic":

arrDF['colOne'] = arrDF['colOne'].str.strip().str[:-3]

or if you want to strip all trailing spaces and numbers:

arrDF['colOne'] = arrDF['colOne'].str.replace(r'[\s\d]+$', '')

test:

In [21]: arrDF['colOne'].str.replace(r'[\s\d]+$', '')
Out[21]:
0    test
1    test
2    test
Name: colOne, dtype: object

Upvotes: 1

Related Questions