Reputation: 667
I have a set of nested list (no more than three deep) that I need to clean. A similar example is this:
test = [['qte%#', 'EKO*^'], ['eoim&', ['35ni%', 'mmie']]]
I would love to run the following:
re.sub(r'[^a-zA-Z\d\[\] ], '', test)
I know the problem here is that I need to iterate over the nested list, but I am having trouble keeping the structure as I do so. Maybe there is also a more simple way to approach the problem. I have tried variations of this:
for a in test:
for b in a:
if isinstance(b, list):
for c in b:
c = re.sub(r'[^a-zA-Z\d\[\] ]', ' ', c)
clean.append(c)
else:
print(b)
b = re.sub(r'[^a-zA-Z\d\[\] ]', ' ', b)
clean.append(b)
Upvotes: 0
Views: 243
Reputation: 195543
This script will leave the structure of the list as it is - just applies the re.sub
function:
test = [['qte%#', 'EKO*^'], ['eoim&', ['35ni%', 'mmie']]]
import re
def clean(lst):
if not isinstance(lst, list):
return re.sub(r'[^a-zA-Z\d\[\] ]', '', lst)
return [clean(v) for v in lst]
print( clean(test) )
Prints:
[['qte', 'EKO'], ['eoim', ['35ni', 'mmie']]]
Upvotes: 1
Reputation: 9117
Since you just need to compile all nested lists into a single flattenned list, you can use flatten function on your list, and do regex on them.
def flatten(lst):
flat = []
for x in lst:
if hasattr(x, '__iter__') and not isinstance(x, basestring):
flat.extend(flatten(x))
else:
flat.append(x)
return flat
clean = []
for c in flatten(test):
clean.append(re.sub(r'[^a-zA-Z\d\[\] ]', ' ', c))
Upvotes: 0