Reputation: 487
I want to store and group different entities on a dictionary given a sentence, their indices in the sentence and entity type.
I have a string like:
text = "My name is David and I live in Miami, but I was born in San Francisco"
And I want to replace entities PERSON and LOCATION in this string via indices, with the following information.
entities = ['PERSON','LOCATION','LOCATION']
start = [11,31,56]
end = [16,36,69]
I've tried this:
def replace_by_index(text: str, entities: List ,start: List,end: List,):
entities_dict = {}
tmp = []
for ent,st,ed in zip(entities,start,end):
entities_dict[ent] = text[st:ed]
return entities_dict
Which obviously doesn't work... Because the first Location gets overwritten!
{'PERSON': 'David', 'LOCATION': 'San Francisco'}
I don't want to use the values of the entities for the logic! statements like:
if ent == 'PERSON':
#logic
This would not work in this case! I want something that could work like this:
def replace_by_index(text: str, entities: List ,start: List,end: List,):
entities_dict = {}
tmp = []
for ent,st,ed in zip(entities,start,end):
entities_dict[ent] = tmp.append(text[st:ed])
return entities_dict
This one returns:
{'PERSON': None, 'LOCATION': None}
DESIRED OUTPUT:
{'PERSON': ['David'], 'LOCATION': ['Miami','San Francisco']}
This is the approach I'm taking to solve a problem and my problem is replacing all entities at the same time given the indices. If I have this dictionary my next step would be to replace the words by their respective entities with string.replace()
. Maybe there's a better approach?
The end goal would be to end up with a string like:
"My name is PERSON_0 and I live in LOCATION_0, but I was born in LOCATION_2"
Upvotes: 1
Views: 92
Reputation: 342
I agree with InfoLearner. It is much easier to set up the dictionary then append to it later. Here is another way to do it.
text = "My name is David and I live in Miami, but I was born in San Francisco"
entities = ['PERSON', 'LOCATION', 'LOCATION']
start = [11, 31, 56]
end = [16, 36, 69]
entities_dict = {ent: [] for ent in set(entities)} # set up dictionary
for st, ed, ent in zip(start, end, entities):
entities_dict[ent].append(text[st:ed]) # append the slice to the item with the appropriate entity
print(entities_dict)
Output:
{'LOCATION': ['Miami', 'San Francisco'], 'PERSON': ['David']}
Upvotes: 1
Reputation: 15608
Try this
r = entities_dic.get(ent,[])
r.append(text[st:ed])
entities_dict[ent] = r
A better approach is to create a dictionary
(start, end): entity
Loop over your sentence tokens.
Replace text[start, end] with dic[(start, end)] where dict is what you create.
Upvotes: 1