Stephen Strosko
Stephen Strosko

Reputation: 667

Using re.sub to clean nested lists

I have a set of nested list (no more than three deep) that I need to clean. A similar example is this:

test = [['qte%#', 'EKO*^'], ['eoim&', ['35ni%', 'mmie']]]

I would love to run the following:

re.sub(r'[^a-zA-Z\d\[\] ], '',  test)

I know the problem here is that I need to iterate over the nested list, but I am having trouble keeping the structure as I do so. Maybe there is also a more simple way to approach the problem. I have tried variations of this:

for a in test:
    for b in a:
        if isinstance(b, list):
            for c in b:
                c = re.sub(r'[^a-zA-Z\d\[\] ]', ' ', c)
                clean.append(c)
        else:
            print(b)
            b = re.sub(r'[^a-zA-Z\d\[\] ]', ' ', b)
            clean.append(b)

Upvotes: 0

Views: 243

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195543

This script will leave the structure of the list as it is - just applies the re.sub function:

test = [['qte%#', 'EKO*^'], ['eoim&', ['35ni%', 'mmie']]]

import re

def clean(lst):
    if not isinstance(lst, list):
        return re.sub(r'[^a-zA-Z\d\[\] ]', '', lst)

    return [clean(v) for v in lst]

print( clean(test) )

Prints:

[['qte', 'EKO'], ['eoim', ['35ni', 'mmie']]]

Upvotes: 1

justhalf
justhalf

Reputation: 9117

Since you just need to compile all nested lists into a single flattenned list, you can use flatten function on your list, and do regex on them.

def flatten(lst):
    flat = []
    for x in lst:
        if hasattr(x, '__iter__') and not isinstance(x, basestring):
            flat.extend(flatten(x))
        else:
            flat.append(x)
    return flat

clean = []
for c in flatten(test):
    clean.append(re.sub(r'[^a-zA-Z\d\[\] ]', ' ', c))

Upvotes: 0

Related Questions