user3596479
user3596479

Reputation: 55

Removing '\xa0' character from a multidimensional list

Consider a following list: (I forgot to mention that my list also has numbers, int-s)

foo_list = [['foo', 100], ['\xa0foo', 200], ['foo\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0', 300], ['foo', 400]]

I've tried cleaning the list with the following function I found on SO when I was googling:

def remove_from_list(l, x):
  new_list = [li.replace(x, u'') for li in l]
  return new_list

foo_list_clean = remove_from_list(foo_list, u'\xa0')

This obviously gives me: (a new error)

AttributeError: 'int' object has no attribute 'replace'

Is it because it's a list of lists? How could I modify the code so that it'd work and remove the '\xa0' character.

My expected output would be a new list with cleaned values from foo_list.

Upvotes: 3

Views: 1043

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124070

Simply use str.strip() on the first element, leaving the rest of the inner list intact:

[[inner[0].strip('\xa0')] + inner[1:] for inner in foo_list]

\xa0 is a non-breaking space, and provided your values are Unicode strings these will be stripped of without specifying an argument. Your sample input consists of bytestrings so I used an explicit strip:

>>> foo_list = [['foo', 100], ['\xa0foo', 200], ['foo\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0', 300], ['foo', 400]]
>>> [[inner[0].strip('\xa0')] + inner[1:] for inner in foo_list]
[['foo', 100], ['foo', 200], ['foo', 300], ['foo', 400]]

Your own approach would work fine too, but you need to use the function on slices of each nested list:

foo_list_clean = [remove_from_list(inner[:1], u'\xa0') + inner[1:] for inner in foo_list_clean]

However, using str.replace() is not needed unless you have those \xa0 non-breaking spaces in between words; your sample only contains them at the starts and ends.

Note that if some elements are integers and others are strings, you'll have to do some duck typing:

[[s.strip('\xa0') if hasattr(s, 'strip') else s for s in inner]
 for inner in foo_list]

Note that if your inputs are instead unicode objects, you'll have to use a matching u'\xa0' string to strip with! Alternatively, just use unicode.strip() without arguments to remove all whitespace from the start and end (as \xa0 is U+00A0 NO-BREAK SPACE and is considered whitespace):

>>> foo_list = [[u'foo', 100], [u'\xa0foo', 200], [u'foo\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0', 300], [u'foo', 400]]
>>> [[inner[0].strip()] + inner[1:] for inner in foo_list]
[[u'foo', 100], [u'foo', 200], [u'foo', 300], [u'foo', 400]]

Upvotes: 2

Related Questions