Zeliax
Zeliax

Reputation: 5386

Removing \n \\n and other unwanted characters from a json unicode dictionary with python

I've tried a couple of different solutions to fix my problem with some "funny" newlines within my json dictionary and none of them works, so I thought I might make a post. The dictionary is achieved by scraping a website.

I have a json dictionary:

my_dict = {
    u"Danish title": u"Avanceret", 
    u"Course type": u"MScTechnol",
    u"Type of":  u"assessmen",
    u"Date": u"\nof exami",
    u"Evaluation": u"7 step sca",
    u"Learning objectives": u"\nA studen",
    u"Participants restrictions": u"Minimum 10",
    u"Aid": u"No Aid",
    u"Duration of Course": u"13 weeks",
    u"name": u"Advanced u",
    u"Department": u"31\n",
    u"Mandatory Prerequisites": u"31545",
    u"General course objectives": u"\nThe cour",
    u"Responsible": u"\nMartin C",
    u"Location": u"Campus Lyn",
    u"Scope and form": u"Lectures, ",
    u"Point( ECTS )": u"10",
    u"Language": u"English",
    u"number": u"31548",
    u"Content": u"\nThe cour",
    u"Schedule": u"F4 (Tues 1"
}

I have stripped the value content to [:10] to reduce clutter, but some of the values have a length of 300 characters. It might not be portrayed well here, but some of values have a lot of newline characters in them and I've tried a lot of different solutions to remove them, such as str.strip and str.replace but without success because my 'values' are unicode. And by values I mean key, value in my_dict.items().

How do I remove all the newlines appearing in my dictionary? (With the values in focus as some of the newlines are trailing, some are leading and others are in the middle of the content: e.i \nI have a\ngood\n idea\n).

EDIT

I am using Python v. 2.7.11 and the following piece of code doesn't produce what I need. I want all the newlines to be changed to a single whitespace character.

for key, value in test.items():
    value = str(value[:10]).replace("\n", " ")
    print key, value

Upvotes: 0

Views: 2623

Answers (3)

ntg
ntg

Reputation: 14085

you need to put the updated value back to your dictionary ( similar to "by value vs. by reference" situation ;) ) ...

to remove the "/n" this one liner may be more "pythonic" :

new_test ={ k:v.replace("\n", "") for k,v in test.iteritems()}

to do what you try to do in your loop try something like:

new_test ={ k:str(value[:10]).replace("\n", " ") for k,v in test.iteritems()}

In your code, value takes the new value, but you never write it back... So for example, this would work (but be slower, also you would be changing the values inside the loop, which should not cause problems, but the interpreter might not like...):

for key, value in test.items():
    value = str(value[:10]).replace("\n", " ")
    #now put it back to the dictionary...
    test[key]=value
    print key, value

Upvotes: 0

sameera sy
sameera sy

Reputation: 1718

If you're trying to remove all \n or any junk character apart from numbers or letters then use regex

for key in my_dict.keys():
    my_dict[key] = mydict[key].replace('\\n', '')
    my_dict[key] = re.sub('[^A-Za-z0-9 ]+', '', my_dict[key])
print my_dict

If you wish to keep anything apart from those then add it on to the character class inside the regex

Upvotes: 1

Suraj
Suraj

Reputation: 168

for remove '\n' try this ....

for key, value in my_dict.items(): my_dict[key] = ''.join(value.split('\n'))

Upvotes: 0

Related Questions