José Rodrigues
José Rodrigues

Reputation: 487

'Append' a value to a dictionary for a specific key

I want to store and group different entities on a dictionary given a sentence, their indices in the sentence and entity type.

I have a string like:

text = "My name is David and I live in Miami, but I was born in San Francisco"

And I want to replace entities PERSON and LOCATION in this string via indices, with the following information.

entities = ['PERSON','LOCATION','LOCATION']
start = [11,31,56]
end = [16,36,69]

I've tried this:

def replace_by_index(text: str, entities: List ,start: List,end: List,):
    entities_dict = {}
    tmp = []
    for ent,st,ed in zip(entities,start,end):
        entities_dict[ent] = text[st:ed]
        
    return entities_dict

Which obviously doesn't work... Because the first Location gets overwritten!

{'PERSON': 'David', 'LOCATION': 'San Francisco'}

I don't want to use the values of the entities for the logic! statements like:

if ent == 'PERSON':
   #logic

This would not work in this case! I want something that could work like this:

def replace_by_index(text: str, entities: List ,start: List,end: List,):
    entities_dict = {}
    tmp = []
    for ent,st,ed in zip(entities,start,end):
        entities_dict[ent] = tmp.append(text[st:ed])
        
    return entities_dict

This one returns:

{'PERSON': None, 'LOCATION': None}

DESIRED OUTPUT:

{'PERSON': ['David'], 'LOCATION': ['Miami','San Francisco']}

This is the approach I'm taking to solve a problem and my problem is replacing all entities at the same time given the indices. If I have this dictionary my next step would be to replace the words by their respective entities with string.replace(). Maybe there's a better approach?

The end goal would be to end up with a string like:

"My name is PERSON_0 and I live in LOCATION_0, but I was born in LOCATION_2"

Upvotes: 1

Views: 92

Answers (2)

Joshua Hall
Joshua Hall

Reputation: 342

I agree with InfoLearner. It is much easier to set up the dictionary then append to it later. Here is another way to do it.

text = "My name is David and I live in Miami, but I was born in San Francisco"
entities = ['PERSON', 'LOCATION', 'LOCATION']
start = [11, 31, 56]
end = [16, 36, 69]
entities_dict = {ent: [] for ent in set(entities)}  # set up dictionary
for st, ed, ent in zip(start, end, entities):
    entities_dict[ent].append(text[st:ed])  # append the slice to the item with the appropriate entity
print(entities_dict)

Output:

{'LOCATION': ['Miami', 'San Francisco'], 'PERSON': ['David']}

Upvotes: 1

InfoLearner
InfoLearner

Reputation: 15608

Try this

r = entities_dic.get(ent,[])
r.append(text[st:ed])
entities_dict[ent] =  r

A better approach is to create a dictionary

(start, end): entity

Loop over your sentence tokens.

Replace text[start, end] with dic[(start, end)] where dict is what you create.

Upvotes: 1

Related Questions