Reputation: 101
I am trying to find in a dataframe
column the words that start and end with vowel.
I couldn't find the regex
way to (1) find all the words starting with vowel.
I just could find words that starts with a certain vowel.
Here is the code
I used:-
# import the CSV file
sales_data = pd.read_csv ("data/sales-data.csv")
#Words starting with 'A'. This works
Vowels1 = sales_data[sales_data['CUSTOMERNAME'].str.startswith('A')]
#Words starting with vowel. This doesn't work. Why?
Vowels2 = sales_data[sales_data['CUSTOMERNAME'].str.startswith(r'[aeiouAEIOU]')]
How can I add the condition that starts and ends (at the same time) with vowel?
#This should work, but it doesn't.
Vowels3 = sales_data[sales_data['CUSTOMERNAME'].str.startswith(r'^[aeiou].*[aeiou]$')]
The message I get for Vowels2 and Vowels3 is:
Empty DataFrame
Columns: [ORDERID, ORDERPRICE, ORDERDATE, STATUS, PRODUCTLINE, PRODUCTCODE, CUSTOMERNAME, CITY, COUNTRY]
Index: []
Thank you
Upvotes: 0
Views: 1326
Reputation: 521904
You could use str.contains
here:
Vowels3 = sales_data[sales_data['CUSTOMERNAME'].str.contains(r'^[aeiou].*[aeiou]\.?$', flags=re.IGNORECASE)]
Upvotes: 1
Reputation: 5745
Becasue you are intersted only on the first and last letters, you don't need the regexp
overhead or even the startwith
which looking for sequence.
Instead you can just apply
on the column the lambda lam
as:
v = ('a','e','i','o','u','A','E','I','O','U')
lam = lambda word: word[0] in v and word[-1] in v
Please note that the case of empty string is not handled here
Upvotes: 0
Reputation: 740
Startswith and Endswith accept tuples, so you can use those:
vowels = ('a','e','i','o','u','A','E','I','O','U')
if myword.startswith(vowels) and myword.endswith(vowels):
print("Yes")
Upvotes: 0