Kevin
Kevin

Reputation: 215

Delete duplicated words separated by comma

I am new in python and I have a text file with the following content

ss ss1, ss ss2, ss ss3, ss ss2, ss ss2, ss ss3

I want to delete the duplicate Words. My expected output is following

ss ss1, ss ss2, ss ss3

I am using the below code

f = open('a.txt', 'r')
file_contents = f.read()
words = file_contents.split()
SS=",".join(sorted(set(words), key=words.index))
print SS

My current output is

ss,ss1,,ss2,,ss3,,ss2

Upvotes: 1

Views: 80

Answers (1)

McGrady
McGrady

Reputation: 11487

If you don't need the order of the list, you can try this:

>>> f="ss ss1, ss ss2, ss ss3, ss ss2, ss ss2, ss ss3"
>>> list(set( f.split(', ')))
['ss ss2', 'ss ss3', 'ss ss1']

Constructing set from list may take a lot time, another way to keep the order:

>>> f="ss ss1, ss ss2, ss ss3, ss ss2, ss ss2, ss ss3"
>>> result=[]
>>> for i in f.split(', '):
...     if i not in result:
...         result.append(i)
...
>>> result
['ss ss1', 'ss ss2', 'ss ss3']

By the way, if the list is very large, and to use set to check if new item already exist is a more efficient way.

>>> result=[]
>>> s=set()
>>> for i in f.split(', '):
...     if i not in s:
...         result.append(i)
...         s.add(i)
...
>>> result
['ss ss1', 'ss ss2', 'ss ss3']

Upvotes: 2

Related Questions