Reputation: 21
I have bytes in which \xao replaces a space. I tried many things to replace it by a space:
tmpstr = tmpstr.decode()
and then
tmpstr = unicodedata.normalize('NFD', tmpstr)
or
tmpstr = unicodedata.normalize('NFC', tmpstr)
or
tmpstr = unicodedata.normalize('NFKD', tmpstr)
I also tried:
tmpstr = tmpstr.replace(u'\xa0', u' ')
or
tmpstr = tmpstr.replace('\xa0', ' ')
Nothing works. Any idea? and also
tmpstr = 'reimbursement\xa0up to 100 Euros per day'
tmpstr = '{"amount_paid":0,"checked_in":false,"checkin_date":"","checkin_secret":"xxx","data":{"Accomodation":{"Accomodation":"Of your choice booked by yourself (reimbursement\u00a0up to 100 Euros per day)","Additional information":""},"Administration":{"Accomodation 1":"","Accomodation 2":"","Accomodation price 1":"","Accomodation price 2":"","Date of birth":"","From 2":"","Membership":"Accepted","Nationality":"","Notes":"","Other reimbursements":"","Phone number":"","To 2":"","Transport reimbursement":""},"Financial help for travel":{"Apply for reimbursement":"No","Could you please elaborate a bit upon the reasons why ?":"","What would be the estimated amount of the travel reimbursement you would need ?":""},"Participation mode":{"From":"31/10/2021","How do you plan to participate?":"In person","To":"13/11/2021"},"Personal Data":{"Affiliation":"xxx","Country":"Italy","Email Address":"xxx","Expertise and topic of research":"xxx","First Name":"xxx","Gender":"Male","I do not want my name and email address to be kept and used by the Institut Pascal for future mailings for and/or by the Institut Pascal":"No","I do not want my pictures to be published on the IPa website and social networks":"No","Last Name":"xxx","Position":"PostDoc","Reason that you are interested in participating in this program":"xxx","Special Requirements":"xxx","Would you need an official invitation for visa-purposes ?":"No"}},"event_id":xxx,"full_name":"xxx","paid":false,"personal_data":{"affiliation":"xxx","country":"xxx","email":"xxx","firstName":"xxx","phone":"","position":"xxx","surname":"xxx","title":"Male"},"price":0,"registrant_id":"11349","registration_date":"2021-07-10T22:08:12.380610+00:00","ticket_price":0}'
Thanks
Upvotes: 2
Views: 797
Reputation: 700
You can try passing your string from a regular expression like below
[0-9a-zA-Z \.\-_{add more special characters}]*
You can paste your string here and see if this works for you before implementing in code
You can also look at this answer. It is the opposite of mine. This answer is removing \x chars using regex.
Example python code
import re
text = "reimbursement\xa0up to 100 Euros per day"
pattern = r'[0-9a-zA-Z\.\-_]*'
print(" ".join([i.strip() for i in re.findall(pattern, text)]))
# -----------------------------------------------
print(re.sub(r'[^\x00-\x7F]+', ' ', text).encode('utf-8').decode('utf-8', 'ignore').strip())
Upvotes: 1