Reputation: 4122
Let's say this is my function:
def function(x):
return x.str.lower()
And this is my DataFrame (df)
A B C D
0 1.67430 BAR 0.34380 FOO
1 2.16323 FOO -2.04643 BAR
2 0.19911 BAR -0.45805 FOO
3 0.91864 BAR -0.00718 BAR
4 1.33683 FOO 0.53429 FOO
5 0.97684 BAR -0.77363 BAR
I want to apply the function to just columns B
and D
. (Applying it to the full DataFrame isn't the answer as that produces NaN values in the numeric columns).
This is my basic idea: df.apply(function, axis=1)
But I cannot fathom how to select distinct columns to apply the function to. I've tried all manner of indexing by numeric position, name, etc.
I've spent quite a bit of time reading around this. This isn't a direct duplicate of any of these:
How to apply a function to two columns of Pandas dataframe
Pandas: How to use apply function to multiple columns
Pandas: apply different functions to different columns
Python Pandas: Using 'apply' to apply 1 function to multiple columns
Upvotes: 3
Views: 9568
Reputation: 11568
Column-wise Apply function for in-place edit:
In [194]: df = pd.DataFrame({"A": ["FOO","BAAR","FOO"], "B": ["FOO","BAR" , "FOO"]})
In [195]: df.loc[:,["A","B"]].apply(lambda col : col.str.lower(), axis = 0) # axis= 0, Default in pandas
Out[195]:
A B
0 foo foo
1 baar bar
2 foo foo
Row-wise Apply function for in-place edit, if needed:
In [201]: df.loc[:,["A","B"]].apply(lambda row : row.str.lower(), axis = 1)
Out[201]:
A B
0 foo foo
1 baar bar
2 foo foo
Other Useful manipulations using column-wise and row-wise apply function:
# For column-wise operation using apply function:
In [224]: df = pd.DataFrame({"A": ["FOO","BAAR","FOO"], "B": ["FOO","BAR" , "FOO"]})
In [225]: df.loc[:,["A","B"]].apply(lambda col : col.str.lower() + "_" + "cool" + "_" + df["B"])
Out[225]:
A B
0 foo_cool_FOO foo_cool_FOO
1 baar_cool_BAR bar_cool_BAR
2 foo_cool_FOO foo_cool_FOO
#Note only second element from each column is taken as an argument for lambda function, so NaN for others:
In [226]: df.loc[:,["A","B"]].apply(lambda col : col[1:2].str.lower() + "_"+ "cool" + "_" + df["B"])
Out[226]:
A B
0 NaN NaN
1 baar_cool_BAR bar_cool_BAR
2 NaN NaN
#For Row-wise operation (row[0] & row[1] points to first and second element of each row,
#or can be called as row["A"] $ row["B"] respectively ):
In [207]: df.loc[:,["A","B"]].apply(lambda row : row["B"].lower() + "_" + row["A"].lower() , axis = 1)
Out[207]:
0 foo_foo
1 bar_baar
2 foo_foo
dtype: object
In [208]: df.loc[:,["A","B"]].apply(lambda row : row[1].lower() + "_" + row[0].lower() , axis = 1)
Out[208]:
0 foo_foo
1 bar_baar
2 foo_foo
dtype: object
#Here, row[1] indicates second element of each row, i.e row["B"] :
In [235]: df.loc[:,["A","B"]].apply(lambda row : row.str.lower() + "_"+ row[1], axis = 1)
Out[235]:
A B
0 foo_FOO foo_FOO
1 baar_BAR bar_BAR
2 foo_FOO foo_FOO
Upvotes: 3
Reputation: 11568
Clean syntax to in-place edit the original column:
df[["A", "B"]] = df[["A","B"]].apply(lambda x: x.str.lower())
Also, to add on new column to the original dataframe:
df[["new_col1", "new_col2"]] = df[["A","B"]].apply(lambda x: x.str.lower())
Upvotes: 2
Reputation: 30258
Apply isn't inplace, it returns a new dataframe, so the question is can you return the complete dataframe in one go.
And you can do it, but it's ugly (it might be slightly faster):
df.apply(lambda x: x.str.lower() if x.name in ['B', 'D'] else x)
If you want to do it to all string columns you can just check the dtype.
Upvotes: 3
Reputation: 394041
Just subselect the columns from the df, by neglecting the axis
param we operate column-wise rather than row-wise which will be significantly as you have more rows than columns here:
df[['B','D']].apply(function)
this will run your func against each column
In [186]:
df[['B','D']].apply(function)
Out[186]:
B D
0 bar foo
1 foo bar
2 bar foo
3 bar bar
4 foo foo
5 bar bar
You can also filter the df to just get the string dtype columns:
In [189]:
df.select_dtypes(include=['object']).apply(function)
Out[189]:
B D
0 bar foo
1 foo bar
2 bar foo
3 bar bar
4 foo foo
5 bar bar
Timings
column-wise versus row-wise:
In [194]:
%timeit df.select_dtypes(include=['object']).apply(function, axis=1)
%timeit df.select_dtypes(include=['object']).apply(function)
100 loops, best of 3: 3.42 ms per loop
100 loops, best of 3: 2.37 ms per loop
However for significantly larger dfs (row-wise) the first method will scale much better
Upvotes: 9