Reputation: 99
I have a file with duplicate lines. What I want is to delete one duplicate to have a file with unique lines. But i get an error output.writelines(uniquelines(filelines)) TypeError: writelines() argument must be a sequence of strings I have searched the same issues but i still don-t understand what is wrong. My code:
def uniquelines(lineslist):
unique = {}
result = []
for item in lineslist:
if item.strip() in unique: continue
unique[item.strip()] = 1
result.append(item)
return result
file1 = codecs.open('organizations.txt','r+','cp1251')
filelines = file1.readlines()
file1.close()
output = open("wordlist_unique.txt","w")
output.writelines(uniquelines(filelines))
output.close()
Upvotes: 3
Views: 6791
Reputation: 25
Hello a got other solve:
For this file:
01 WLXB64US
01 WLXB64US
02 WLWB64US
02 WLWB64US
03 WLXB67US
03 WLXB67US
04 WLWB67US
04 WLWB67US
05 WLXB93US
05 WLXB93US
06 WLWB93US
06 WLWB93US
Solution:
def deleteDuplicate():
try:
f = open('file.txt','r')
lstResul = f.readlines()
f.close()
datos = []
for lstRspn in lstResul:
datos.append(lstRspn)
lstSize = len(datos)
i = 0
f = open('file.txt','w')
while i < lstSize:
if i == 0:
f.writelines(datos[i])
else:
if (str(datos[i-1].strip())).replace(' ','') == (str(datos[i].strip())).replace(' ',''):
print('next...')
else:
f.writelines(datos[i])
i = i + 1
except Exception as err:
Upvotes: 0
Reputation: 2922
It is rather common in python to remove duplicate objects from a sequence using a set. The only downside to using set is you lose order (the same way you loose order in dictionary keys, in fact its the same exact reason, but that's not important.) If order in your files matters, you can use the keys of an OrderedDict (standard library as of... 2.7 I think) to act as a psudo-set, and remove duplicate strings from a sequence of strings. If order does not matter, use set()
instead of collections.OrderedDict.fromkeys()
. Using the file modes 'rb' (read binary) and 'wb' (write binary), you stop having to worry about encoding - Python will just treat them as bytes. This uses a context manager syntax introduced later than 2.5, so you may need to adjust with context lib as-needed if this is a syntax error for you.
import collections
with open(infile, 'rb') as inf, open(outfile, 'wb') as outf:
outf.writelines(collections.OrderedDict.fromkeys(inf))
Upvotes: 0
Reputation: 10694
i wouldn't bother encoding or decoding at all .. open with simplyopen('organizations'txt', 'rb')
as well as open('wordlist_unique.txt', 'wb')
and you should be fine.
Upvotes: 1
Reputation: 703
If you don't need to have the lines in order afterwards, I suggest you to put the strings in a set. set(linelist)
. The lineorder would be screwed up but the duplicates would be gone.
Upvotes: 0
Reputation: 368904
The code uses different open: codecs.open
when it reads, open
when it writes.
readlines
of file object created using codecs.open
returns list of unicode strings. While writelines
of file objects create using open
expect a sequence of (bytes) strings.
Replace following lines:
output = open("wordlist_unique.txt","w")
output.writelines(uniquelines(filelines))
output.close()
with:
output = codecs.open("wordlist_unique.txt", "w", "cp1251")
output.writelines(uniquelines(filelines))
output.close()
or preferably (using with
statement):
with codecs.open("wordlist_unique.txt", "w", "cp1251") as output:
output.writelines(uniquelines(filelines))
Upvotes: 3