user12904074
user12904074

Reputation:

how to group element's of a list with respect of some information in each elements?

I have a list. each element's of my list is like this:

list[0]={'Keywords': ' foster care case aide ',
 'categoryId': '1650',
 'result': {'categoryId': '1650',
  'categoryName': 'case aide',
  'score': '1.04134220123291'}}

can I collect all keywords whose have the same categoryId in the same group. and count for each categoryId how many keywords do I have ?

please let me know if it is not possible

Upvotes: 0

Views: 57

Answers (1)

Tom
Tom

Reputation: 8790

You could use the collections.defaultdict to make a set for each categoryId and add the associated words:

from collections import defaultdict

output = defaultdict(set)

for entry in list:
    kwds = entry['Keywords'].strip().split(' ')
    for word in kwds:
        output[entry['categoryId']].add(word)

I'm using a set because I assumed you don't want repeats of words within each categoryId. You could instead use a list or some other collection.

You can then get out the size of each ID:

for k, v in output.items():
    print(f'ID: {k}, words: {len(v)}')

# ID: 1650, words: 4

Responding to the comments from OP:

If you are getting KeyError: 'categoryId', that means some entries do not have the key 'categoryId'. If you want to simply skip those entries, you can add a small catch into the above solution:

for entry in list:
    # catch if there is a missing ID field
    if entry.get('categoryId', None) is None: 
        continue
  
    # otherwise the same
    kwds = entry['Keywords'].strip().split(' ')
    for word in kwds:
        output[entry['categoryId']].add(word)

If there is no categoryID, the entry will be skipped.

Note that we are also depending on a "Keywords" field being there as well, so you may need to add a catch for that as well.

Or, if you want to collect all the keywords from entries without an ID, you can just use dict.get() in the original solution:

for entry in data:
    kwds = entry['Keywords'].strip().split(' ')
    for word in kwds:
        output[entry.get('categoryId', None)].add(word)

Now if there is no categoryId, the keywords will be added to the key None in output.

Upvotes: 2

Related Questions