mezz
mezz

Reputation: 437

str.contains to create new column in pandas dataframe

I am exploring the titanic data set and want to create a column with similar names. For example, any name that contains "Charles" will show as "ch",as I want to do some group by using those later on. I created a function using the following code:

def cont(Name):
    for a in Name:
        if a.str.contains('Charles'):
            return('Ch')

and then applied using this:

titanic['namest']=titanic['Name'].apply(cont,axis=1)

Error: 'str' object has no attribute 'str'

notebook_link

Upvotes: 6

Views: 8431

Answers (2)

EdChum
EdChum

Reputation: 394051

Rather than use a loop or apply you can use the vectorised str.contains to return a boolean mask and set all rows where the condition is met to your desired value:

titanic.loc[titanic['Name'].str.contains('Charles'), 'namest'] = 'Ch'

Upvotes: 12

DeepSpace
DeepSpace

Reputation: 81604

apply will call the cont function and pass it a value from the Name column, a value by value. That means that the Name variable inside the cont function is already a string.

Also note that every function that is being used by apply must return something, so in case the name doesn't contain 'Charles' the name itself is returned.

Also 2, Series apply method doesn't have an axis keyword argument.

def cont(Name):
    if 'Charles' in Name:
        return 'Ch'
    return Name

You don't even need to define it:

titanic['namest'] = titanic['Name'].apply(lambda x: 'Ch' if 'Charles' in x else x)

Upvotes: 4

Related Questions