Reputation: 513
I have a few tuples look like this. I would like to combine all the word in the same sentence.
('1.txt','sentence 1.1','city')
('1.txt','sentence 1.1','apple')
('1.txt','sentence 1.1','ok')
('1.txt','sentence 1.2','go')
('1.txt','sentence 1.2','home')
('1.txt','sentence 1.2','city')
('2.txt','sentence 2.1','sign')
('2.txt','sentence 2.1','tree')
('2.txt','sentence 2.1','cat')
('2.txt','sentence 2.2','good')
('2.txt','sentence 2.2','image')
how to combine the word according to the sentences for example:
('1.txt','sentence 1.1','city apple ok')
('1.txt','sentence 1.2','go home city')
('2.txt','sentence 2.1','sign tree cat')
('2.txt','sentence 2.2','good image')
or maybe in this way as list or dictionary
['1.txt','sentence 1.1',['city','apple','ok']]
['1.txt','sentence 1.2',['go','home','city']]
['2.txt','sentence 2.1',['sign', 'tree', 'cat']]
['2.txt','sentence 2.2',['good', 'image']]
if i would like to convert to dictionary , how to do that?
Upvotes: 1
Views: 947
Reputation: 3146
You can try this
l=[]
l.append(('1.txt','sentence 1.1','city'))
l.append(('1.txt','sentence 1.1','apple'))
l.append( ('1.txt','sentence 1.1','ok') )
l.append( ('1.txt','sentence 1.2','go') )
l.append( ('1.txt','sentence 1.2','home') )
l.append( ('1.txt','sentence 1.2','city') )
l.append( ('2.txt','sentence 2.1','sign') )
l.append( ('2.txt','sentence 2.1','tree') )
l.append( ('2.txt','sentence 2.1','cat') )
l.append( ('2.txt','sentence 2.2','good') )
l.append( ('2.txt','sentence 2.2','image') )
d={}
for i in l:
myKey=i[0]+" "+i[1]
if myKey in d:
d[myKey].append(i[2])
else:
d[myKey]=[]
ans=[]
for k in d:
v=k.split(" ")
ans.append([v[0],''.join(v[1]+" "+v[2]),d[k]])
print sorted(ans)
Upvotes: 0
Reputation: 214957
You can also use groupby
with the first two elements of each tuple as the key, assuming your list of tuples has already been sorted by the first two elements before hand:
from itertools import groupby
[[k[0], k[1], [i[2] for i in g]] for k, g in groupby(lst, key = lambda x: x[:2])]
#[['1.txt', 'sentence 1.1', ['city', 'apple', 'ok']],
# ['1.txt', 'sentence 1.2', ['go', 'home', 'city']],
# ['2.txt', 'sentence 2.1', ['sign', 'tree', 'cat']],
# ['2.txt', 'sentence 2.2', ['good', 'image']]]
Upvotes: 2
Reputation:
Based on your input data, it seems like the words are keyed against a combination of the first and second items (indices 0 and 1) of the tuple.
You can build a dictionary mapping this item combination to words, and do some post-processing to reformat the data into the structure you want.
Here's a procedural, O(n) approach.
import collections
sentences = collections.defaultdict(list)
for file_name, sentence_id, word in input_data:
sentences[(file_name, sentence_id)].append(word)
# sentences is now formatted like {('1.txt', 'sentence 1.1'): ['city', 'apple', 'go']}
for key, val in sentences.items():
print list(key) + [val]
# ['1.txt', 'sentence 1.1', ['city', 'apple', 'go']]
Upvotes: 2