Amir Parkar
Amir Parkar

Reputation: 51

Conditionally concatenate string values of a tuple in a list in python based on the elements

Here is the list which includes tags to the word type

t = [('The','OTHER'),('name','OTHER'),('is','OTHER'),('Wall','ORGANIZATION'),('Mart','ORGANIZATION'),('and','OTHER'),('Thomas','ORGANIZATION'),('Cook','ORGANIZATION')]

The expectation is to conditionally check if the subsequent tuple is tagged as organization if so concatenate them with a space and continue with the same over the entire list.

Expected output:

Wall Mart, Thomas Cook

for x in t:
    if(x[1] == 'ORGANIZATION'):
         org_list = org_list + ' | ' + x[0]

I was just able to extract the names but not really getting a way where I could concatenate the words tagged as organization.

Refereed to other Question asked: [Link]Concatenate elements of a tuple in a list in python

Expected output: Wall Mart, Thomas Cook

Upvotes: 1

Views: 172

Answers (2)

yatu
yatu

Reputation: 88236

Given that there will always be an 'OTHER' between two subsequent 'ORGANIZATION', one approach is using itertools.groupby to group subsequent tuples by their second element, and str.join their first items if the grouping key is 'ORGANIZATION':

t = [('The','OTHER'),('name','OTHER'),('is','OTHER'),('Wall','ORGANIZATION'),
     ('Mart','ORGANIZATION'),('and','OTHER'),('Thomas','ORGANIZATION'),
     ('Cook','ORGANIZATION')]

from itertools import groupby
from operator import itemgetter as g

[' '.join(i[0] for i in [*v]) for k,v in groupby(t, key=g(1)) if k=='ORGANIZATION']
# ['Wall Mart', 'Thomas Cook']

If you prefer a for loop solution without any imports, you can do: -- This will work only for two subsequent tags:

f = False
out = []
for i in t:
    if i[1] == 'ORGANIZATION':
        if not f:
            out.append(i[0])
            f = True
        else:
            out[-1] += f' {i[0]}'
            f = False

print(out)
# ['Wall Mart', 'Thomas Cook']

Upvotes: 2

Mykola Zotko
Mykola Zotko

Reputation: 17814

You can use the following solution:

t = [('The','OTHER'),('name','OTHER'),('is','OTHER'),('Wall','ORGANIZATION'),('Mart','ORGANIZATION'),('and','OTHER'),('Thomas','ORGANIZATION'),('Cook','ORGANIZATION')]

result = [[]]
for i, j in t:
    if j == 'ORGANIZATION':
        result[-1].append(i)
    elif result[-1]:
        result.append([])       

result = [' '.join(i) for i in result if i]
# ['Wall Mart', 'Thomas Cook']

Upvotes: 1

Related Questions