Reputation: 615
I recently found out about the str
method for Pandas series and it's great! However if I want to chain operations (say, a couple replace
and a strip
) I need to keep calling str
after every operation, making it not the most elegant code.
For example, lets say my column names contain spaces and periods and I want to replace them by underscores. I might also want to strip any leftover underscores. If I wanted to do this using str
methods, is there any way of avoiding having to run:
df.columns.str.replace(' ', '_').str.replace('.', '_').str.strip('_')
Thanks!
Upvotes: 8
Views: 3929
Reputation: 3354
Let me add my two cents to improve the answers:
I was just curious if we could chain str operations together
We would like to have something like tweet.str.replace('@',).strip().lower()
.
In fact, we could hope that a chain of operations can be even further optimized (compiled) into something like tweet.str.replace_strip_lower_combined
.
While this is perfectly reasonable, the current API only processes one operation at a time and doesn't support such combining.
Why not to use list comprehensions
Because of performance: pd.Series.str
offers vectorized string functions.
Upvotes: 0
Reputation: 402824
Why not use a list comprehension?
import re
df.columns = [re.sub('[\s.]', '_', x).strip('_') for x in df.columns]
In a list comp, you're working with the string object directly, without the need to call .str
each time.
Upvotes: 2
Reputation: 863176
I think need str
repeat for each .str
function, it is per design.
But here is possible use only one replace
:
df = pd.DataFrame(columns=['aa dd', 'dd.d_', 'd._'])
print (df)
Empty DataFrame
Columns: [aa dd, dd.d_, d._]
Index: []
print (df.columns.str.replace('[\s+.]', '_').str.strip('_'))
Index(['aa_dd', 'dd_d', 'd'], dtype='object')
Upvotes: 7