icatalan
icatalan

Reputation: 101

Python - find words starting and ending with vowel in dataframe column

I am trying to find in a dataframe column the words that start and end with vowel.

I couldn't find the regex way to (1) find all the words starting with vowel. I just could find words that starts with a certain vowel.

Here is the code I used:-

# import the CSV file
sales_data = pd.read_csv ("data/sales-data.csv")

#Words starting with 'A'. This works
Vowels1 = sales_data[sales_data['CUSTOMERNAME'].str.startswith('A')]

#Words starting with vowel. This doesn't work. Why?
Vowels2 = sales_data[sales_data['CUSTOMERNAME'].str.startswith(r'[aeiouAEIOU]')]

How can I add the condition that starts and ends (at the same time) with vowel?

#This should work, but it doesn't.
Vowels3 = sales_data[sales_data['CUSTOMERNAME'].str.startswith(r'^[aeiou].*[aeiou]$')]
The message I get for Vowels2 and Vowels3 is:
Empty DataFrame
Columns: [ORDERID, ORDERPRICE, ORDERDATE, STATUS, PRODUCTLINE, PRODUCTCODE, CUSTOMERNAME, CITY, COUNTRY]
Index: []

Thank you

Upvotes: 0

Views: 1326

Answers (3)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521904

You could use str.contains here:

Vowels3 = sales_data[sales_data['CUSTOMERNAME'].str.contains(r'^[aeiou].*[aeiou]\.?$', flags=re.IGNORECASE)]

Upvotes: 1

Lior Cohen
Lior Cohen

Reputation: 5745

Becasue you are intersted only on the first and last letters, you don't need the regexp overhead or even the startwith which looking for sequence.

Instead you can just apply on the column the lambda lam as:

v = ('a','e','i','o','u','A','E','I','O','U')
lam = lambda word: word[0] in v and word[-1] in v

Please note that the case of empty string is not handled here

Upvotes: 0

Ismail Hafeez
Ismail Hafeez

Reputation: 740

Startswith and Endswith accept tuples, so you can use those:

vowels = ('a','e','i','o','u','A','E','I','O','U')
if myword.startswith(vowels) and myword.endswith(vowels):
    print("Yes")

Upvotes: 0

Related Questions