Reputation:
I have a list. each element's of my list is like this:
list[0]={'Keywords': ' foster care case aide ',
'categoryId': '1650',
'result': {'categoryId': '1650',
'categoryName': 'case aide',
'score': '1.04134220123291'}}
can I collect all keywords whose have the same categoryId
in the same group. and count for each categoryId
how many keywords
do I have ?
please let me know if it is not possible
Upvotes: 0
Views: 57
Reputation: 8790
You could use the collections.defaultdict
to make a set
for each categoryId
and add the associated words:
from collections import defaultdict
output = defaultdict(set)
for entry in list:
kwds = entry['Keywords'].strip().split(' ')
for word in kwds:
output[entry['categoryId']].add(word)
I'm using a set
because I assumed you don't want repeats of words within each categoryId
. You could instead use a list
or some other collection.
You can then get out the size of each ID:
for k, v in output.items():
print(f'ID: {k}, words: {len(v)}')
# ID: 1650, words: 4
Responding to the comments from OP:
If you are getting KeyError: 'categoryId'
, that means some entries do not have the key 'categoryId'
. If you want to simply skip those entries, you can add a small catch into the above solution:
for entry in list:
# catch if there is a missing ID field
if entry.get('categoryId', None) is None:
continue
# otherwise the same
kwds = entry['Keywords'].strip().split(' ')
for word in kwds:
output[entry['categoryId']].add(word)
If there is no categoryID
, the entry will be skipped.
Note that we are also depending on a "Keywords"
field being there as well, so you may need to add a catch for that as well.
Or, if you want to collect all the keywords from entries without an ID, you can just use dict.get()
in the original solution:
for entry in data:
kwds = entry['Keywords'].strip().split(' ')
for word in kwds:
output[entry.get('categoryId', None)].add(word)
Now if there is no categoryId
, the keywords will be added to the key None
in output
.
Upvotes: 2