Reputation: 5386
I've tried a couple of different solutions to fix my problem with some "funny" newlines within my json dictionary and none of them works, so I thought I might make a post. The dictionary is achieved by scraping a website.
I have a json dictionary:
my_dict = {
u"Danish title": u"Avanceret",
u"Course type": u"MScTechnol",
u"Type of": u"assessmen",
u"Date": u"\nof exami",
u"Evaluation": u"7 step sca",
u"Learning objectives": u"\nA studen",
u"Participants restrictions": u"Minimum 10",
u"Aid": u"No Aid",
u"Duration of Course": u"13 weeks",
u"name": u"Advanced u",
u"Department": u"31\n",
u"Mandatory Prerequisites": u"31545",
u"General course objectives": u"\nThe cour",
u"Responsible": u"\nMartin C",
u"Location": u"Campus Lyn",
u"Scope and form": u"Lectures, ",
u"Point( ECTS )": u"10",
u"Language": u"English",
u"number": u"31548",
u"Content": u"\nThe cour",
u"Schedule": u"F4 (Tues 1"
}
I have stripped the value content to [:10]
to reduce clutter, but some of the values have a length of 300 characters. It might not be portrayed well here, but some of values have a lot of newline characters in them and I've tried a lot of different solutions to remove them, such as str.strip
and str.replace
but without success because my 'values' are unicode. And by values I mean key, value in my_dict.items()
.
How do I remove all the newlines appearing in my dictionary? (With the values in focus as some of the newlines are trailing, some are leading and others are in the middle of the content: e.i \nI have a\ngood\n idea\n
).
I am using Python v. 2.7.11 and the following piece of code doesn't produce what I need. I want all the newlines to be changed to a single whitespace character.
for key, value in test.items():
value = str(value[:10]).replace("\n", " ")
print key, value
Upvotes: 0
Views: 2623
Reputation: 14085
you need to put the updated value back to your dictionary ( similar to "by value vs. by reference" situation ;) ) ...
to remove the "/n" this one liner may be more "pythonic" :
new_test ={ k:v.replace("\n", "") for k,v in test.iteritems()}
to do what you try to do in your loop try something like:
new_test ={ k:str(value[:10]).replace("\n", " ") for k,v in test.iteritems()}
In your code, value takes the new value, but you never write it back... So for example, this would work (but be slower, also you would be changing the values inside the loop, which should not cause problems, but the interpreter might not like...):
for key, value in test.items():
value = str(value[:10]).replace("\n", " ")
#now put it back to the dictionary...
test[key]=value
print key, value
Upvotes: 0
Reputation: 1718
If you're trying to remove all \n
or any junk character apart from numbers or letters then use regex
for key in my_dict.keys():
my_dict[key] = mydict[key].replace('\\n', '')
my_dict[key] = re.sub('[^A-Za-z0-9 ]+', '', my_dict[key])
print my_dict
If you wish to keep anything apart from those then add it on to the character class inside the regex
Upvotes: 1
Reputation: 168
for remove '\n' try this ....
for key, value in my_dict.items(): my_dict[key] = ''.join(value.split('\n'))
Upvotes: 0