Prachi Vaishnav
Prachi Vaishnav

Reputation: 48

Removing duplicate words from a string in python

I have been working on removing duplicate words from an input string. I have used OrderedDict in order to do so. I have tried to implement the above using two different methods, which are nearly the same, but the output of the program is different. Can anyone explain why is this happening?

Code 1:

    from collections import OrderedDict
    data = "the an a the"
    data="".join(OrderedDict.fromkeys(data))
     print(data)

Code 2:

    from collections import OrderedDict   
    data = "the an a the"
    data = "".join(OrderedDict.fromkeys(data.split(" ")))   
    print(data)

The output of Code 1: "the an" The output of Code 2: "theana" I want to know why is this difference caused, and also, I expect to get the result as "the an a", so how to get it?

Upvotes: 1

Views: 217

Answers (3)

ruhaib
ruhaib

Reputation: 649

OrderedDict.fromkeys(data) will make dictionary from every single character inside the string "data". result:

{
    '**t**': None,
    '**h**': None,
    '**e**': None,
    ...
}

whereas:

OrderedDict.fromkeys(data.split(" ")) will make a new dictionary with every word in the string (or more precisely, everything split by space) as keys of a new dictionary.

result:

{
    '**the**': None,
    '**an**': None,
    ...
}

and since you want space separated result, you should join the data back with a space,

" ".join(OrderedDict.fromkeys(data.split(" ")))
 ^ #Notice this space here.

Also, try to debug your code, Debugging is a major part of programming and it gives much deeper understanding of the code as well.

Upvotes: 0

Underoos
Underoos

Reputation: 5200

In your 1st approach:

data="".join(OrderedDict.fromkeys(data))

basically considers the variable data as an iterable. In this case, it will consider the string as iterable which contains unique characters. So the unique characters would be t,h,e,,a,n and the ordered dictionary is created with totally 6 keys.


In your 2nd approach:

data = "".join(OrderedDict.fromkeys(data.split(" ")))

you are splitting the string into a list (which means iterable). and the list elements are the, an, a and the ordered dictionary is created with 3 unique values as keys.

And in the final step you are joining them, which means just the keys will be returned as a string.

Hope this helps.

Upvotes: 3

suman das
suman das

Reputation: 367

string1 = "the an a the"
words = string1.split()
print (" ".join(sorted(set(words), key=words.index)))

Upvotes: 0

Related Questions