Bal Krishna Jha
Bal Krishna Jha

Reputation: 7286

different result for str.contains and str.find

In my opinion both should give same answer:

train = pd.read_csv('https://raw.github.com/mattdelhey/kaggle-titanic/master/Data/train.csv')
train.name.str.contains('Mr.').sum()
(train.name.str.find('Mr.')>0).sum()

but output is:

647
517

What is the reason behind different result?

Upvotes: 1

Views: 56

Answers (1)

jezrael
jezrael

Reputation: 863741

Difference is str.contains also match Mrs., because . is special regex character (it is used to match any character).

I think need escape it or add parameter regex=False:

print(train.name.str.contains('Mr\.').sum())
517
print(train.name.str.contains('Mr.', regex=False).sum())
517
print((train.name.str.find('Mr.')>0).sum())
517

Testing difference:

a = train.loc[train.name.str.contains('Mr.'), 'name']
b = train.loc[(train.name.str.find('Mr.')>0), 'name']


c = pd.concat([a, b], axis=1, keys=('contains','find'))
c = c[c.isnull().any(axis=1)]
print (c)
                                              contains find
1    Cumings, Mrs. John Bradley (Florence Briggs Th...  NaN
3         Futrelle, Mrs. Jacques Heath (Lily May Peel)  NaN
8    Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)  NaN
9                  Nasser, Mrs. Nicholas (Adele Achem)  NaN
15                    Hewlett, Mrs. (Mary D Kingcome)   NaN
18   Vander Planke, Mrs. Julius (Emelia Maria Vande...  NaN
19                             Masselmani, Mrs. Fatima  NaN
25   Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...  NaN
31      Spencer, Mrs. William Augustus (Marie Eugenie)  NaN
40      Ahlin, Mrs. Johan (Johanna Persdotter Larsson)  NaN
41   Turpin, Mrs. William John Robert (Dorothy Ann ...  NaN
49       Arnold-Franchi, Mrs. Josef (Josefine Franchi)  NaN
52            Harper, Mrs. Henry Sleeper (Myna Haxtun)  NaN
53   Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkin...  NaN
66                        Nye, Mrs. (Elizabeth Ramell)  NaN
85   Backstrom, Mrs. Karl Alfred (Maria Mathilda Gu...  NaN
...
...

Upvotes: 1

Related Questions