user3496483
user3496483

Reputation: 35

Python Duplicate Words in Text File

I have a txt file

the text file looks like this:

Still not working.

['None']['Vega~']['Vega~']['Vega~']['8^)->-<']['violence']['puker']['Zanaz']['Funk']['8^)->-<']['8^)->-<']['8^)->-<']['Vega~']['violence']['Zanaz']['Funk']['puker']['Vega~']['Vega~']['Vega~']['8^)->-<']['violence']['puker']['Zanaz']['None']['Lawn']['Lawn']['Lawn']['Leafy']['Judge69']['David']['lilwade']['Pity.']['artofwar']['Hazecloud']['Lawn']['Lawn']['Lawn']['Judge69']['Leafy']['David']['lilwade']['Hazecloud']['Lawn']['Lawn']['Lawn']['Leafy']['David']['Pity.']['lilwade']['artofwar']['Judge69']

I need to remove all the duplicates so each name should only show one time, also it must keep the order they are in.

   fo = open('C:\Python26\myfile.txt','r')
   name_cache = fo.readlines()
   typea = name_cache[0]

   def unique_list(l):
      ulist = []
      [ulist.append(x) for x in l if x not in ulist]
      return ulist

   mast =' '.join(unique_list(typea.split()))
   print mast

Upvotes: 1

Views: 414

Answers (4)

Sina Khelil
Sina Khelil

Reputation: 1991

Solution that keeps the brackets around the names:

fo = open('myfile.txt','r')
name_cache = fo.readlines()[0]
names = []
for name in name_cache.replace('][', '],[').split(','):
    if name not in names:
        names.append(name)

print(names)

Upvotes: 0

James
James

Reputation: 1238

First remove the leading [ and trailing ]. Then split on ][. For example

>>> x="['None']['Vega~']['Vega~']"
>>> x.rstrip(']').lstrip('[').split('][') 
["'None'", "'Vega~'", "'Vega~'"]

Then call your unique_list.

>>> y = x.rstrip(']').lstrip('[').split('][') 
>>> unique_list(y)
["'None'", "'Vega~'"]

Then you can easily format it to whatever you want (i.e. to a string).

Note that rstrip and lstrip are each O(n). So it might be better to do x[1:-1]. This assumes you are 100% certain that the input is of the given form (starts with [ and ends with ])

This has the same O(n) time complexity as hashing every word (adding to a python set), but maintains the original order, and gets to use your (pretty neat) unique_list function.

Upvotes: 3

Amr Ayman
Amr Ayman

Reputation: 1159

You can do this:

import collections
def unique_list(l): return list(OrderedSet(l))

Also, typea is just a string with no whitespace. To split names, do this:

typea = typea.replace('[', '').split(']') # typea is now a list

Upvotes: 0

doru
doru

Reputation: 9110

s = "['None']['Vega~']['Vega~']['Vega~']['8^)->-<']['violence']['puker']['Zanaz']['Funk']['8^)->-<']['8^)->-<']['8^)->-<']['Vega~']['violence']['Zanaz']['Funk']['puker']['Vega~']['Vega~']['Vega~']['8^)->-<']['violence']['puker']['Zanaz']['None']['Lawn']['Lawn']['Lawn']['Leafy']['Judge69']['David']['lilwade']['Pity.']['artofwar']['Hazecloud']['Lawn']['Lawn']['Lawn']['Judge69']['Leafy']['David']['lilwade']['Hazecloud']['Lawn']['Lawn']['Lawn']['Leafy']['David']['Pity.']['lilwade']['artofwar']['Judge69']"
ss = s[1:-1]
l = []
for i in ss.split(']['):
    if i not in l:
        l.append(i)
r = ' '.join(l)

With the result:

"'None' 'Vega~' '8^)->-<' 'violence' 'puker' 'Zanaz' 'Funk' 'Lawn' 'Leafy' 'Judge69' 'David' 'lilwade' 'Pity.' 'artofwar' 'Hazecloud'"

Upvotes: 0

Related Questions