NicolasVega
NicolasVega

Reputation: 61

Python - Remove non alphanumeric characters but keep spaces and Spanish/Portuguese characters

Say I have

text = "El próximo AÑO, vamos a salir a Curaçao... :@ :) será el día #MIÉRCOLES 30!!!!"

How can I turn it into

text2 = "El próximo AÑO vamos a salir a Curaçao será el día MIÉRCOLES 30"

Using regex?

Upvotes: 2

Views: 538

Answers (2)

blhsing
blhsing

Reputation: 106598

If you need compatibility with Python 2.7 you can use the str.isalpha() method:

# -*- coding: utf-8 -*-
import re
text = u"El próximo AÑO, vamos a salir a Curaçao... :@ :) será el día #MIÉRCOLES 30!!!!"
print(re.sub(' +', ' ', ''.join(c for c in text if c.isalpha() or c.isdigit() or c.isspace())))

This outputs:

El próximo AÑO vamos a salir a Curaçao será el día MIÉRCOLES 30

Upvotes: 0

Carlos Mermingas
Carlos Mermingas

Reputation: 3892

You can try using the \W character class:

re.sub(r'\W+', ' ', text)

Upvotes: 1

Related Questions