user8560167
user8560167

Reputation:

str.contains in pandas with if statement, python

this is simple question but i don't know why i cannot compare if corectly.

df:

A,B
1,marta
2,adam1
3,kama
4,mike

i want to print 'exist' if specific name exist in df

for example, i want to check if marta exist in df['B']

code:

string='www\marta2'
if df['B'].str.contains(string,regex=False).all()==True:
    print('exist')
else:
    print('not exist')

when i am using .bool() instead of all() i am receiving error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I am reveiving False on each line, why ? should i compare this type of string some how in different way?

EDIT: I need to use IF statement because in my code instead of print my code need to assigh variables, normaly i would use different way. If my string='marta' it works well but with additional string not

EDIT:

new code:
string='www\marta2'
    if df['B'].str.rfind(string).any():
        print('exist')
    else:
        print('not exist')
but it compares everything, so even if one letter is in column it will print 'exist'

Upvotes: 0

Views: 8420

Answers (3)

user8560167
user8560167

Reputation:

answer on my question: to recive only one answer if string exist or not in column, good way is to use df.str.contains(), as we know str.contains is comparing only whole string, that's why my first code doesn't work. second way is to use rfind but it will be always true because this function is comparing single letters in my case.

the answer is to prepare string that i am comparing to receive expected result

string='www\marta2'
new_string=string.split('\\')[-1][0:5]
if df['B'].str.contains(new_string,regex=False).any():
    print('exist')
else:
    print('not exist')

Upvotes: 0

May be this will help you:

>>> for b in df["B"].values:
...     if string.rfind(b) != -1:
...         print("exists")
...         break
...

The looping which is for loop includes df["B"].values which returns array values of column B. Now if you have the array you can loop through it and thus get the output.
In the condition if statement, I have just compared each of the values of the B column. rfind() given the output of the partially matched string output or substring.
It thus the magic.

Upvotes: 0

Alex Huong Tran
Alex Huong Tran

Reputation: 71

if you want to check if the string exists at all in the whole df, use any() instead of all().

If you want to check if the string exists for each row, you can create a new column and don't have to use if statement

df.loc[df['B'].str.contains(string,regex=False), 'C'] = 'exist'
df.loc[~(df['B'].str.contains(string,regex=False)), 'C'] = 'not exist'

EDIT: I tried this and it works as long as the string is exactly what you're looking for.

string='www\marta2'
if df['name'].str.contains(string,regex=False).any():
    print('exist')
else:
    print('not exist')

Upvotes: 1

Related Questions