sharp
sharp

Reputation: 2158

Python - Merge list of tuples from nested list

I have list of list of tuples that I want to merge. Below code combines the properties with single list passed into 'classified_text', how do I iterate this concept for nested list of tuples? I tried adding another for loop and append method, but I get different error. Any simple way to do this? Thanks!

Input Text 1 - Working:

classified_text = [('John', 'PERSON'), ('Smith', 'PERSON'),('University', 'ORGANIZATION'), ('of', 'ORGANIZATION'), ('ABC', 'ORGANIZATION')] # Single list

Output Text 1 - Working:

[('PERSON      ', 'John Smith'), ('ORGANIZATION', 'University of ABC')]

Input Text 2 - Not Working: Nested list with tuples

classified_text = [[('John', 'PERSON'), ('Smith', 'PERSON')], [('University', 'ORGANIZATION'), ('of', 'ORGANIZATION'), ('ABC', 'ORGANIZATION')], [('some', 'O'), ('text', 'O'), ('here', 'O')], [('Mark', 'O'), ('from', 'O'), ('University', 'ORGANIZATION'), ('of', 'ORGANIZATION'), ('CA', 'ORGANIZATION')]]

Code:

from itertools import groupby
entity_extracted_words = []
for tag, chunk in groupby(classified_text, lambda x:x[1]):
    if tag != "O":
        info_ner = "%-12s"%tag, " ".join(w for w, t in chunk)
        entity_extracted_words.append(info_ner)

print('entity_extracted_words:\n', entity_extracted_words)

Out Text 2 - Trying to get this result:

[('PERSON      ', 'John Smith'), ('ORGANIZATION', 'University of ABC'),('ORGANIZATION', 'University of CA')] 

Error: TypeError: not all arguments converted during string formatting

Upvotes: 2

Views: 238

Answers (2)

Stephen C
Stephen C

Reputation: 2036

Try something like this. Simply for-loop over the sublists, combining into a string and add them to the newlist

classified_text = [[('John', 'PERSON'), ('Smith', 'PERSON')], 
                   [('University', 'ORGANIZATION'), ('of', 'ORGANIZATION'), ('ABC', 'ORGANIZATION')],
                   [('some', 'O'), ('text', 'O'), ('here', 'O')],
                   [('Mark', 'O'), ('from', 'O'), ('University', 'ORGANIZATION'), ('of', 'ORGANIZATION'), ('CA', 'ORGANIZATION')]]

newlist = []
for sublist in classified_text:
    combined = []
    for chunk, tag in sublist:
        if tag == 'O':
            continue
        combined_tag = tag
        combined.append(chunk)

    # Append tag and string to list
    if combined:
        # If you wanted to space filled as in your example, you can use
        # the strings ljust method
        newlist.append((combined_tag.ljust(12), ' '.join(combined)))

print(newlist)

#[('PERSON      ', 'John Smith'),
# ('ORGANIZATION', 'University of ABC'),
# ('ORGANIZATION', 'University of CA')]

Upvotes: 2

kabdulla
kabdulla

Reputation: 5429

You could first flatten your list of lists into just a list:

flat_list = [item for sublist in classified_text for item in sublist]

And that flat list should work with your original code.

Upvotes: 0

Related Questions