Reputation: 437
I am exploring the titanic data set and want to create a column with similar names. For example, any name that contains "Charles" will show as "ch",as I want to do some group by using those later on. I created a function using the following code:
def cont(Name):
for a in Name:
if a.str.contains('Charles'):
return('Ch')
and then applied using this:
titanic['namest']=titanic['Name'].apply(cont,axis=1)
Error: 'str' object has no attribute 'str'
Upvotes: 6
Views: 8431
Reputation: 394051
Rather than use a loop or apply
you can use the vectorised str.contains
to return a boolean mask and set all rows where the condition is met to your desired value:
titanic.loc[titanic['Name'].str.contains('Charles'), 'namest'] = 'Ch'
Upvotes: 12
Reputation: 81604
apply
will call the cont
function and pass it a value from the Name
column, a value by value. That means that the Name
variable inside the cont
function is already a string.
Also note that every function that is being used by apply
must return something, so in case the name doesn't contain 'Charles' the name itself is returned.
Also 2, Series
apply
method doesn't have an axis
keyword argument.
def cont(Name):
if 'Charles' in Name:
return 'Ch'
return Name
You don't even need to define it:
titanic['namest'] = titanic['Name'].apply(lambda x: 'Ch' if 'Charles' in x else x)
Upvotes: 4