Reputation: 79
I have a text file. The guts of it look like this/ all of it looks like this (has been edited. This was also not what it initially looked like)
(0, 16, 0)
(0, 17, 0)
(0, 18, 0)
(0, 19, 0)
(0, 20, 0)
(0, 21, 0)
(0, 22, 0)
(0, 22, 1)
(0, 22, 2)
(0, 23, 0)
(0, 23, 4)
(0, 24, 0)
(0, 25, 0)
(0, 25, 1)
(0, 26, 0)
(0, 26, 3)
(0, 26, 4)
(0, 26, 5)
(0, 26, 9)
(0, 27, 0)
(0, 27, 1)
Anyway, how do I put these values into a set on python 2?
My most recent attempt was
om_set = set(open('Rye Grass.txt').read()
EDIT: This is the code I used to get my text file. import cv2 import numpy as np import time
om=cv2.imread('spectrum1.png')
om=om.reshape(1,-1,3)
om_list=om.tolist()
om_tuple={tuple(item) for item in om_list[0]}
om_set=set(om_tuple)
im=cv2.imread('1.jpg')
im=cv2.resize(im,(100,100))
im= im.reshape(1,-1,3)
im_list=im.tolist()
im_tuple={tuple(item) for item in im_list[0]}
ColourCount= om_set & set(im_tuple)
with open('Weedlist', 'a') as outputfile:
output = ', '.join([str(tup) for tup in sorted(ColourCount)])
outputfile.write(output)
print 'done'
im=cv2.imread('2.jpg')
im=cv2.resize(im,(100,100))
im= im.reshape(1,-1,3)
im_list=im.tolist()
im_tuple={tuple(item) for item in im_list[0]}
ColourCount= om_set & set(im_tuple)
with open('Weedlist', 'a') as outputfile:
output = ', '.join([str(tup) for tup in sorted(ColourCount)])
outputfile.write(output)
print 'done'
Upvotes: 0
Views: 4735
Reputation: 3335
As @TimPietzcker suggested and trusting the file to only have these fixed representations of integers in comma separated triplets, surrounded by parentheses, a simple parser in one go (OP's question also had a greed "read" of file into memors):
#! /usr/bin/env python
from __future__ import print_function
infile = 'pixel_int_tuple_reps.txt'
split_pits = None
with open(infile, 'rt') as f_i:
split_pits = [z.strip(' ()') for z in f_i.read().strip().split('),')]
if split_pits:
on_set = set(tuple(int(z.strip())
for z in tup.split(', ')) for tup in split_pits)
print(on_set)
tramsforms:
(0, 19, 0), (0, 20, 0), (0, 21, 1), (0, 22, 0), (0, 24, 3), (0, 27, 0), (0, 29, 2), (0, 35, 2), (0, 36, 1)
into:
set([(0, 27, 0), (0, 36, 1), (0, 21, 1), (0, 22, 0), (0, 24, 3), (0, 19, 0), (0, 35, 2), (0, 29, 2), (0, 20, 0)])
The small snippet:
splits the pixel integer triplets into substrings of 0, 19, 0
cleansing a bit the stray parens and spaces away (also taking care of the closing parentheses at the end.
if that "worked" - further feeds the rgb split with integer conversion tuples into a set.
I would really think twice, before using eval/exec on that kind of deserialization task.
Update as suggested by comments from OP (please update the question!):
... so until we have further info from OP:
For a theoretical clean 3-int-tuple dump file this answer works (if not too big to load at once and map into a set).
For the concrete task, I may update the answer if sufficient new info has been added to the question ;-)
One way, if the triple "lines" are concat from previous stages with or without a newline separating, but alwayss missing the comma, to change the file reading part either:
s = s | fresh
that is tackling them in "isolation"or if these "chunks" are added like so (0, 1, 230)(13, ...
that is )(
"hitting hard":
f_i.read().strip().split('),')
write f_i.read().replace(')('), (', ').strip().split('),')
... that is "fixing" the )(
part into a ), (
part to be able to continue as if it would be a homogene "structure".Update now parsing the version 2 of the dataset (updated question):
File pixel_int_tuple_reps_v2.txt
now has:
(0, 16, 0)
(0, 17, 0)
(0, 18, 0)
(0, 19, 0)
(0, 20, 0)
(0, 21, 0)
(0, 22, 0)
(0, 22, 1)
(0, 22, 2)
(0, 23, 0)
(0, 23, 4)
(0, 24, 0)
(0, 25, 0)
(0, 25, 1)
(0, 26, 0)
(0, 26, 3)
(0, 26, 4)
(0, 26, 5)
(0, 26, 9)
(0, 27, 0)
(0, 27, 1)
The code:
#! /usr/bin/env python
from __future__ import print_function
infile = 'pixel_int_tuple_reps_v2.txt'
on_set = set()
with open(infile, 'rt') as f_i:
for line in f_i.readlines():
rgb_line = line.strip().lstrip('(').rstrip(')')
try:
rgb = set([tuple(int(z.strip()) for z in rgb_line.split(', '))])
on_set = on_set.union(rgb)
except:
print("Ignored:" + rgb_line)
pass
print(len(on_set))
for rgb in sorted(on_set):
print(rgb)
Now parses this file and first dumps the length of the set and (as is the elements in sorted order):
21
(0, 16, 0)
(0, 17, 0)
(0, 18, 0)
(0, 19, 0)
(0, 20, 0)
(0, 21, 0)
(0, 22, 0)
(0, 22, 1)
(0, 22, 2)
(0, 23, 0)
(0, 23, 4)
(0, 24, 0)
(0, 25, 0)
(0, 25, 1)
(0, 26, 0)
(0, 26, 3)
(0, 26, 4)
(0, 26, 5)
(0, 26, 9)
(0, 27, 0)
(0, 27, 1)
HTH. Note that there are no duplicates in the provided sample input. Doubling the last data line I still rceived 21 unique elements as output, so I guess now it works as designed ;-)
Upvotes: 1
Reputation: 16091
Only need small modification.You can try this.
om_set = set(eval(open('abc.txt').read()))
Result
{(0, 19, 0),
(0, 20, 0),
(0, 21, 1),
(0, 22, 0),
(0, 24, 3),
(0, 27, 0),
(0, 29, 2),
(0, 35, 2)}
Edit
Here is the working of code in in IPython
prompt.
In [1]: file_ = open('abc.txt')
In [2]: text_read = file_.read()
In [3]: print eval(text_read)
((0, 19, 0), (0, 20, 0), (0, 21, 1), (0, 22, 0), (0, 24, 3), (0, 27, 0), (0, 29, 2), (0, 35, 2), (0, 36, 1))
In [4]: type(eval(text_read))
Out[1]: tuple
In [5]: print set(eval(text_read))
set([(0, 27, 0), (0, 36, 1), (0, 21, 1), (0, 22, 0), (0, 24, 3), (0, 19, 0), (0, 35, 2), (0, 29, 2), (0, 20, 0)])
Upvotes: 0