Reputation: 48
I have been working on removing duplicate words from an input string. I have used OrderedDict in order to do so. I have tried to implement the above using two different methods, which are nearly the same, but the output of the program is different. Can anyone explain why is this happening?
Code 1:
from collections import OrderedDict
data = "the an a the"
data="".join(OrderedDict.fromkeys(data))
print(data)
Code 2:
from collections import OrderedDict
data = "the an a the"
data = "".join(OrderedDict.fromkeys(data.split(" ")))
print(data)
The output of Code 1: "the an" The output of Code 2: "theana" I want to know why is this difference caused, and also, I expect to get the result as "the an a", so how to get it?
Upvotes: 1
Views: 217
Reputation: 649
OrderedDict.fromkeys(data)
will make dictionary from every single character inside the string "data".
result:
{
'**t**': None,
'**h**': None,
'**e**': None,
...
}
whereas:
OrderedDict.fromkeys(data.split(" "))
will make a new dictionary with every word in the string (or more precisely, everything split by space) as keys of a new dictionary.
result:
{
'**the**': None,
'**an**': None,
...
}
and since you want space separated result, you should join the data back with a space,
" ".join(OrderedDict.fromkeys(data.split(" ")))
^ #Notice this space here.
Also, try to debug your code, Debugging is a major part of programming and it gives much deeper understanding of the code as well.
Upvotes: 0
Reputation: 5200
In your 1st approach:
data="".join(OrderedDict.fromkeys(data))
basically considers the variable data
as an iterable. In this case, it will consider the string as iterable which contains unique
characters. So the unique characters would be t
,h
,e
,,
a
,n
and the ordered dictionary is created with totally 6 keys.
In your 2nd approach:
data = "".join(OrderedDict.fromkeys(data.split(" ")))
you are splitting the string into a list (which means iterable). and the list elements are the
, an
, a
and the ordered dictionary is created with 3 unique values as keys.
And in the final step you are joining them, which means just the keys will be returned as a string.
Hope this helps.
Upvotes: 3
Reputation: 367
string1 = "the an a the"
words = string1.split()
print (" ".join(sorted(set(words), key=words.index)))
Upvotes: 0