user3486773
user3486773

Reputation: 1246

How to conditionally remove vowels from pandas dataframe columns?

I have a simple dataframe df:

{'Testingthislongcolumnthatwouldbreakoracle': {0: 3, 1: 3, 2: 3},
 'goodcolum': {0: 1, 1: 1, 2: 1},
 'goodcolum2': {0: 2, 1: 2, 2: 2}}

I am trying to determine if the length of the column is > 30, and if so, check to see if removing the vowels will make length of the column <= 30. If so I'd like to then strip the vowels out of the column name and save it back to dataframe. Here is what I have so far:

for columnName in df:
    charlength=len(columnName)
    vowels=sum(list(map(columnName.lower().count, "aeiou")))
    if charlength >= 31:
        if charlength - vowels <= 31:
             df[columnName] = df.columns([columnName]).str.replace('[aAeEiIoOuU]', '')
            
    print(columnName, charlength,vowels)
df

But this isn't making any changes. the end result would change the column 'Testingthislongcolumnthatwouldbreakoracle' to 'Tstngthslngclmnthtwldbrkrcl'

Upvotes: 0

Views: 315

Answers (2)

norie
norie

Reputation: 9857

Try creating a list with the new names for the columns.

import pandas as pd

df = pd.DataFrame({'Testingthislongcolumnthatwouldbreakoracle': {0: 3, 1: 3, 2: 3},
 'goodcolum': {0: 1, 1: 1, 2: 1},
 'goodcolum2': {0: 2, 1: 2, 2: 2}})

col_names = []
for columnName in df:
    charlength=len(columnName)
    vowels=sum(list(map(columnName.lower().count, "aeiou")))
    if charlength >= 31:
        if charlength - vowels <= 31:
          col_names.append(''.join(char for char in columnName if char not in 'aeiouAEIOU'))
    else:
      col_names.append(columnName)   
    
print(df)
df.columns=col_names
print(df)

Upvotes: 1

ddejohn
ddejohn

Reputation: 8962

The str.replace() function doesn't work like that. You need to replace each vowel with the empty string individually:

>>> s = "Testingthislongcolumnthatwouldbreakoracle"
>>> for vowel in "aeiou":
...     s = s.replace(vowel, "")
...
>>> s
'Tstngthslngclmnthtwldbrkrcl'

Also, just a heads up, you don't need to cast your map to list before summing the results. Would actually be more efficient to

s = columnName.lower()
vowel_count = sum(s.count(v) for v in "aeiou")

Upvotes: 0

Related Questions