RDJ
RDJ

Reputation: 4122

Pandas: How to apply a function to different columns

Let's say this is my function:

def function(x):
    return x.str.lower()

And this is my DataFrame (df)

   A         B     C       D 
0  1.67430   BAR  0.34380  FOO 
1  2.16323   FOO -2.04643  BAR
2  0.19911   BAR -0.45805  FOO
3  0.91864   BAR -0.00718  BAR
4  1.33683   FOO  0.53429  FOO
5  0.97684   BAR -0.77363  BAR

I want to apply the function to just columns B and D. (Applying it to the full DataFrame isn't the answer as that produces NaN values in the numeric columns).

This is my basic idea: df.apply(function, axis=1)

But I cannot fathom how to select distinct columns to apply the function to. I've tried all manner of indexing by numeric position, name, etc.

I've spent quite a bit of time reading around this. This isn't a direct duplicate of any of these:

How to apply a function to two columns of Pandas dataframe

Pandas: How to use apply function to multiple columns

Pandas: apply different functions to different columns

Python Pandas: Using 'apply' to apply 1 function to multiple columns

Upvotes: 3

Views: 9568

Answers (4)

Surya Chhetri
Surya Chhetri

Reputation: 11568

Column-wise Apply function for in-place edit:

In [194]: df = pd.DataFrame({"A": ["FOO","BAAR","FOO"], "B": ["FOO","BAR" , "FOO"]})

In [195]: df.loc[:,["A","B"]].apply(lambda col : col.str.lower(), axis = 0) # axis= 0, Default in pandas 
Out[195]: 
      A    B
0   foo  foo
1  baar  bar
2   foo  foo

Row-wise Apply function for in-place edit, if needed:

In [201]: df.loc[:,["A","B"]].apply(lambda row : row.str.lower(), axis = 1)
Out[201]: 
      A    B
0   foo  foo
1  baar  bar
2   foo  foo

Other Useful manipulations using column-wise and row-wise apply function:

# For column-wise operation using apply function:
In [224]: df = pd.DataFrame({"A": ["FOO","BAAR","FOO"], "B": ["FOO","BAR" , "FOO"]})

In [225]: df.loc[:,["A","B"]].apply(lambda col : col.str.lower() + "_" + "cool" + "_" + df["B"])
Out[225]: 
               A             B
0   foo_cool_FOO  foo_cool_FOO
1  baar_cool_BAR  bar_cool_BAR
2   foo_cool_FOO  foo_cool_FOO

#Note only second element from each column is taken as an argument for lambda function, so NaN for others:
In [226]: df.loc[:,["A","B"]].apply(lambda col : col[1:2].str.lower() + "_"+ "cool" + "_" + df["B"])
Out[226]: 
               A             B
0            NaN           NaN
1  baar_cool_BAR  bar_cool_BAR
2            NaN           NaN


#For Row-wise operation (row[0] & row[1] points to first and second element of each row, 
#or can be called as row["A"] $ row["B"] respectively ):
In [207]: df.loc[:,["A","B"]].apply(lambda row : row["B"].lower() + "_" + row["A"].lower() , axis = 1)
Out[207]: 
0     foo_foo
1    bar_baar
2     foo_foo
dtype: object

In [208]: df.loc[:,["A","B"]].apply(lambda row : row[1].lower() + "_" + row[0].lower() , axis = 1)
Out[208]: 
0     foo_foo
1    bar_baar
2     foo_foo
dtype: object

#Here, row[1] indicates second element of each row, i.e row["B"] :
In [235]: df.loc[:,["A","B"]].apply(lambda row : row.str.lower() + "_"+ row[1], axis = 1)
Out[235]: 
          A        B
0   foo_FOO  foo_FOO
1  baar_BAR  bar_BAR
2   foo_FOO  foo_FOO

Upvotes: 3

Surya Chhetri
Surya Chhetri

Reputation: 11568

Clean syntax to in-place edit the original column:

df[["A", "B"]] = df[["A","B"]].apply(lambda x: x.str.lower())

Also, to add on new column to the original dataframe:

df[["new_col1", "new_col2"]] = df[["A","B"]].apply(lambda x: x.str.lower())

Upvotes: 2

AChampion
AChampion

Reputation: 30258

Apply isn't inplace, it returns a new dataframe, so the question is can you return the complete dataframe in one go.
And you can do it, but it's ugly (it might be slightly faster):

df.apply(lambda x: x.str.lower() if x.name in ['B', 'D'] else x)

If you want to do it to all string columns you can just check the dtype.

Upvotes: 3

EdChum
EdChum

Reputation: 394041

Just subselect the columns from the df, by neglecting the axis param we operate column-wise rather than row-wise which will be significantly as you have more rows than columns here:

df[['B','D']].apply(function)

this will run your func against each column

In [186]:
df[['B','D']].apply(function)

Out[186]:
     B    D
0  bar  foo
1  foo  bar
2  bar  foo
3  bar  bar
4  foo  foo
5  bar  bar

You can also filter the df to just get the string dtype columns:

In [189]:
df.select_dtypes(include=['object']).apply(function)

Out[189]:
     B    D
0  bar  foo
1  foo  bar
2  bar  foo
3  bar  bar
4  foo  foo
5  bar  bar

Timings

column-wise versus row-wise:

In [194]:    
%timeit df.select_dtypes(include=['object']).apply(function, axis=1)
%timeit df.select_dtypes(include=['object']).apply(function)

100 loops, best of 3: 3.42 ms per loop
100 loops, best of 3: 2.37 ms per loop

However for significantly larger dfs (row-wise) the first method will scale much better

Upvotes: 9

Related Questions