Sue_ka
Sue_ka

Reputation: 31

Cleaning a list of non-uniform phrases

I have a list which looks something like this.

["['brill building pop",'quiet storm','ballad','easy listening',"motown'"," 'disco",'soul jazz',
'smooth jazz','soul','jazz','soft rock',"uk garage'"," 'chill-out",'german pop','salsa','r&b',
'chanson','rock',"pop'"," 'blues-rock",'vocal jazz','funk','oldies','pop rock',"downtempo'",
" 'hip hop",'classic rock','united states','germany',"adult contemporary'"," 'folk rock",'vocal',
'soundtrack','blues','female vocalist',"electronic'"," 'new wave",'urban','reggae','singer-songwriter',
 'swing','60s',"female'"," 'american",'80s','90s',"ambient']"]

It is supposed to look like this:

['brill building pop','quiet storm','ballad','easy listening','motown','disco','soul jazz',
'smooth jazz','soul','jazz','soft rock','uk garage','chill-out','german pop','salsa','r&b',
'chanson','rock','pop','blues-rock','vocal jazz','funk','oldies','pop rock','downtempo',
'hip hop','classic rock','united states','germany','adult contemporary','folk rock','vocal',
'soundtrack','blues','female vocalist','electronic','new wave','urban','reggae','singer-songwriter',
'swing','60s','female','american','80s','90s','ambient']

As you can see there is stray apostrophes, incomplete square brackets, whitespaces etc. The elements are meant to be phrases, so while I do not want to strip the whitespaces in the middle of the words, I want to remove them if they come at the start or at the end. Is there an easy way to do this?

Upvotes: 0

Views: 28

Answers (1)

stevemo
stevemo

Reputation: 1097

The way this is structured, it's already the right list, just with lots of extra stuff, so you can use replace() and strip(), like this:

zmod = [zz.replace('\'', '').replace('[', '').replace(']', '').strip() for zz in z]
zmod
['brill building pop',
 'quiet storm',
 'ballad',
 'easy listening',
...
 'american',
 '80s',
 '90s',
 'ambient']

There's certainly a shorter regex approach, but I find this the most readable.

Upvotes: 2

Related Questions