Asqan
Asqan

Reputation: 4479

Unwanted spaces around characters in regex

Consider the following string variable:

data = '23jodfjkle lj ioerz\nlkdsjflj sldjj\\difd ioiörjlezr'

What i want to create is string with alphabetical characters, character \n and character ö. Therefore i wrote the following:

(" ".join(re.findall("[a-zA-Z]+|\n|ö", data)))

But what i take is:

'jodfjkle ljkgfj opz ioerz \n lkdsjflj sldjj difd ioi ö rjlezr'

Why are there spaces around the characters \n and ö? What should i change in order to take a solution without spaces:

'jodfjkle ljkgfj opz ioerz\nlkdsjflj sldjj difd ioiörjlezr'

Upvotes: 2

Views: 49

Answers (1)

gtlambert
gtlambert

Reputation: 11961

By using the | operator in your regex, the Python regex parser considers [a-zA-Z]+, \n and ö as different matches. When you use " ".join() you therefore introduce a space around all matches, including the \n and the ö.

To achieve your desired output move the \n and ö inside the square brackets:

print(" ".join(re.findall("[a-zA-Z\nö]+", data)))

Output

jodfjkle lj ioerz\nlkdsjflj sldjj difd ioiörjlezr

Upvotes: 4

Related Questions