Reputation: 655
I want to search a pattern in a string and then again search some invalid character in matching pattern and then remove them or replace with some valid characters.
I have some sample dictionaries eg. sample_dict = {"randomId":"123y" uhnb\n g", "desc": ["sample description"]}
In this case I want to find the value of a dictionary let say "123y" uhnb\n g" and then remove invalid characters in it such as (", \t, \n) etc..
what I have tried is stored all the dictionaries in a file then read file and matching pattern for dictionary value, but this gives me a list of matching pattern, I can also compile these matches but I am not sure how to perform replace in original dictionary value so my final output will be:
{"randomId":"123y uhnb g", "desc": ["sample description"]}
pattern = re.findall("\":\"(.+?)\"", sample_dict)
expected result:
{"randomId":"123y uhnb g", "desc": ["sample description"]}
actual result:
['123y" uhnb\n g']
Upvotes: 1
Views: 62
Reputation: 20500
You can just substitute non-alphanumeric characters in your value using re.sub as below
dct = {"randomId":"123y uhnb\n g", "desc": ["sample description"]}
import re
for key, value in dct.items():
val = None
#If the value is a string, directly substitute
if isinstance(value, str):
val = re.sub(r"[^a-zA-Z0-9 ]", '', str(value))
#If value is a list, substitute for all string in the list
elif isinstance(value, list):
val = []
for item in value:
val.append(re.sub(r"[^a-zA-Z0-9]", ' ', str(item)))
dct[key] = val
print(dct)
#{'randomId': '123y uhnb g', 'desc': ['sample description']}
Upvotes: 1