Reputation: 81
I have an array of key:value pairs which I am generating using a loop over the contents (entity extraction) of documents.
entity_array.append({
"key": entity.label_,
"value": entity.text
})
I would like to add in a check that if the key of value already exists don't append but unsure how using key AND value. The reason being I am getting a lot of duplicate rows.
I'm able to check if the key OR value exists but this doesn't give the desired result as an entity could belong to multiple keys.
Any help appreciated.
Upvotes: 0
Views: 5199
Reputation: 4548
It sounds like the data structure you are using is causing you some issues. If you want to keep track of duplicate combinations of entity.label_
and entity.text
values, consider treating the combination as a namedtuple
and using a set
to quickly check for duplicates:
import collections
Entity = collections.namedtuple("Entity", ["key", "value"]) # a tuple called "Entity" with named elements
entity_set = set() # empty set where we will store deduplicated combinations of label and text
for entity in your_iterable_here:
entity_set.add(Entity(key=entity.label_, value=entity.text)) # add to the set if it's not there already, otherwise do nothing
You can even do this as a one-liner if you want:
entity_set = set(Entity(key=entity.label_, value=entity.text) for entity in your_iterable_here)
When you are done, you will have a collection of unique key/value pairs in entity_set
. If you absolutely need the entities in the data structure mentioned in the OP (a list of dicts), one option is to take advantage of the namedtuple._asdict() function (which, despite the underscore in the name, is a fully documented function and a part of the "public" namedtuple interface):
entity_array = [entity._asdict() for entity in entity_set]
There are two caveats to this solution:
entity._label
and entity.text
are, they must be hashable to be put into a set
. There are ways around this if the things you are storing are not simple values like strings, but it can get complicated.your_iterable_here
will not be preserved. There easy ways around this, like using an OrderedDict with Entity
keys and bool
values instead of a set.Upvotes: 1
Reputation: 11
You'll have to check two conditions - (a) if the key is not present in the target dictionary, and (b) if the key is present but the value is different. In both cases, you will have to add the new value to the dictionary.
For e.g., suppose dict{} is your main dictionary, and values_to_add below is a new dictionary that has some values that need to be added to dict{}. The below code does what you're looking to do:
from itertools import combinations
from datetime import timedelta
import datetime
import pandas as pd
import numpy as np
import random as rd
dict = {
"Key_1": "Value_1",
"Key_2": "Value_2",
"Key_3": "Value_3"
}
values_to_add = {
"Key_1": "Value_X",
"Key_4": "Value_4"
}
for key,value in values_to_add.items():
if key in dict and dict[key] != value:
dict[key]=value
if not key in dict:
dict[key] = value
dict
Upvotes: 1
Reputation: 631
you can implements your own function for that, example you can call get
method with given key and compare the returned value with your spected value:
def exists(dict_:dict, key:str, value:object) -> bool:
return dict_.get(key) == value
Upvotes: 0