James Cook
James Cook

Reputation: 344

check if string is contained within cell value

I have a pandas dataframe:

df = pd.DataFrame({"Code": [77000581079,77000458432,77000458433,77000458434,77000691973], "Description": ['16/06/2009ø01/08/2009', 'ø16/06/2009:Date Breakpoint','16/06/2009ø01/08/2009:Date Breakpoint','01/08/2009ø:Date Breakpoint','01/08/2009ø:Date Breakpoint']})

I want to check if Description contains a str 16/06/2009ø01/08/2009:Date Breakpoint

If this returns True then I want to append -A to the code

Expected output :

    Code        Description
0   77000581079-A   16/06/2009ø01/08/2009:Date Breakpoint
1   77000458432     ø16/06/2009:Date Breakpoint
2   77000458433-A   16/06/2009ø01/08/2009:Date Breakpoint
3   77000458434     01/08/2009ø:Date Breakpoint
4   77000691973     01/08/2009ø:Date Breakpoint

Using :

for row in df['Description']:
    if df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint'):
        print(row)
else:
        pass

I get ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Ive tried:

for row in df['Description']:
        if df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint').all():
            print(row)
    else:
            pass

But still no Joy, ive read some docs on this error but Im abit confused about its meaning..

Is there a better way to achieve my desired outcome?

Upvotes: 0

Views: 64

Answers (2)

Jarvis
Jarvis

Reputation: 8564

You need to write your condition like this:

if df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint').sum() > 0:
    print(row)

Your condition without using sum will return an indicator vector so you can't directly evaluate its boolean value. It will be an array of Falses and Trues, so you sum it and get a positive (>0) value even if a single True is present in the array. This is what you get without sum:

>>> df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint')
0    False
1    False
2     True
3    False
4    False
Name: Description, dtype: bool
>>> df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint').sum()
1

or just use any:

if df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint').any():
    print(row)

Upvotes: 0

BENY
BENY

Reputation: 323226

Let us try str.contains

df.Code = df.Code.astype(str)
df.loc[df.Description.str.contains('16/06/2009ø01/08/2009:Date Breakpoint'),'Code'] += '-A'

Upvotes: 4

Related Questions