Reputation: 344
I have a pandas dataframe:
df = pd.DataFrame({"Code": [77000581079,77000458432,77000458433,77000458434,77000691973], "Description": ['16/06/2009ø01/08/2009', 'ø16/06/2009:Date Breakpoint','16/06/2009ø01/08/2009:Date Breakpoint','01/08/2009ø:Date Breakpoint','01/08/2009ø:Date Breakpoint']})
I want to check if Description
contains a str
16/06/2009ø01/08/2009:Date Breakpoint
If this returns True
then I want to append -A
to the code
Expected output :
Code Description
0 77000581079-A 16/06/2009ø01/08/2009:Date Breakpoint
1 77000458432 ø16/06/2009:Date Breakpoint
2 77000458433-A 16/06/2009ø01/08/2009:Date Breakpoint
3 77000458434 01/08/2009ø:Date Breakpoint
4 77000691973 01/08/2009ø:Date Breakpoint
Using :
for row in df['Description']:
if df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint'):
print(row)
else:
pass
I get ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Ive tried:
for row in df['Description']:
if df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint').all():
print(row)
else:
pass
But still no Joy, ive read some docs on this error but Im abit confused about its meaning..
Is there a better way to achieve my desired outcome?
Upvotes: 0
Views: 64
Reputation: 8564
You need to write your condition like this:
if df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint').sum() > 0:
print(row)
Your condition without using sum
will return an indicator vector so you can't directly evaluate its boolean value. It will be an array of False
s and True
s, so you sum it and get a positive (>0) value even if a single True
is present in the array. This is what you get without sum
:
>>> df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint')
0 False
1 False
2 True
3 False
4 False
Name: Description, dtype: bool
>>> df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint').sum()
1
or just use any
:
if df['Description'].str.contains('16/06/2009ø01/08/2009:Date Breakpoint').any():
print(row)
Upvotes: 0
Reputation: 323226
Let us try str.contains
df.Code = df.Code.astype(str)
df.loc[df.Description.str.contains('16/06/2009ø01/08/2009:Date Breakpoint'),'Code'] += '-A'
Upvotes: 4