Reputation: 33

Text stripping issue

Apologies in advance if this turns out to be a PEBKAC issue, but I can't see what I'm doing wrong.

Python 3.5.1 (FWIW)

I've pulled data from an online source, each line of the page is .strip() 'ed of \r\n, etc. and converted to a utf-8 string. The lines I'm looking for are reduced further below.

I want to take two strings, join them and strip out all the non-alphanumerics.

> x = "ABC"
> y = "Some-text as an example."
> z = x+y.lower()

> type z
<class 'str'>

So here's the problem.

> z = z.strip("'-. ")
> print z

Why is the result:

ABCsome-text as an example.

and not, as I would like:

ABCsometextasanexample

I can get it to work with four .replace() commands, but strip really doesn't want to work here. I've also tried separate split commands:

> y = y.strip("-")
> print(y)
some-text as an example.

Whereas

> y.replace("-", '')
> print(y)
sometext as an example.

Any thoughts on what I might be doing wrong with .strip()?

Upvotes: 2

Answers (4)

martineau

Reputation: 123473

As others have pointed out, the problem with strip() is that it only operates on characters at the beginning and end of strings—so using replace() multiple times would be the way to accomplish what you want using just string methods.

Although not the question you asked, here's how to do it using one call to do with the re.sub() function in the re regular-expression module. The arbitrary characters to be replaced are defined by the contents of the string variable name chars.

import re

x = "ABC"
y = "Some-text as an example."
z = x + y.lower()

print('before: {!r}'.format(z))  # -> before: 'ABCsome-text as an example.'

chars = "'-. "  # Characters to be replaced.
z = re.sub('(' + '|'.join(re.escape(ch) for ch in chars) + ')', '', z)

print('after: {!r}'.format(z))  # -> after: 'ABCsometextasanexample'

Upvotes: 0

ems

Reputation: 990

Another solution would be using python's filter():

import re

x = "ABC"
y = "Some-text as an example."
z = x+y.lower()

z = filter(lambda c: c.isalnum(), z)

Upvotes: 0

ems

Reputation: 990

Since you wish to remove all the non-alphanumeric characters, lets make it more generic using:

import re

x = "ABC"
y = "Some-text as an example."
z = x+y.lower()

z = re.sub(r'\W+', '', z)

Upvotes: 2

Bryan Oakley

Reputation: 385980

Strip doesn't strip all characters, it only removes characters from the ends of strings.

From the official documentation

Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped

Upvotes: 1

Text stripping issue

Answers (4)

Related Questions