CPhillips
CPhillips

Reputation: 59

Condense a Python Dictionary with Similar Keys

I currently have a dictionary that has several keys that are similar but are formatted differently (Visual Studio, Visual studio / JavaScript,Javascript,javascript).

How would I condense the dictionary so there's only one of a certain key, (Visual Studio, JavaScript, etc.) rather than the above example?

Note: Elements such as Vue and Vue.js are meant to be separate keys.

Is there something obvious that I'm missing?

Code for reference

def getVal(keys, data):
    techCount = dict()
    other = 0
    remList = []

    # Initialize Dictionary with Keys
    for item in keys:
        techCount[item] = 0

    # Load Values into Dictionary
    for item in data:
        techCount[item] += 1

    # Creates the 'Other' field
    for key, val in techCount.items():
        if val <= 1:
            other += 1
            remList.append(key)

    techCount['Other'] = other

    # Remove Redundant Keys
    for item in remList:
        techCount.pop(item)

    
    # Sort the Dictionary
    techCount = {key: val for key, val in sorted(
        techCount.items(), key=lambda ele: ele[1])}

    # Break up the Data
    keys = techCount.keys()
    techs = techCount.values()

    return keys, techs

Full List:

JavaScript: 3
C#: 9
Visual studio: 2
Docker: 4       
Azure: 4        
AngularJs: 2
Java: 3
Visual Studio: 5
SQL: 4
Javascript: 5
Typescript: 3
AngularJS: 3
WordPress: 2
Zoho: 3
Drupal: 2
CSS: 9
.NET: 3
Python: 6
ReactJS: 3
HTML: 8
ASP.NET: 2
PHP: 2
Jira: 2
Other: 43

Upvotes: 1

Views: 190

Answers (3)

Dac2020
Dac2020

Reputation: 165

It is basically what has already been said. Unify the keys by converting them to lowercase and then adding the values ​​of the repeated keys.

data = {'JavaScript': 3,'C#': 9,'Visual studio': 2,'Docker': 4, 'Azure': 4,'AngularJs': 2,'Java': 3,'Visual Studio': 5,'SQL': 4,'Javascript': 5,'Typescript': 3,'AngularJS': 3,'WordPress': 2,'Zoho': 3,'Drupal': 2,'CSS': 9,'.NET': 3,'Python': 6,'ReactJS': 3,'HTML': 8,'ASP.NET': 2,'PHP': 2,'Jira': 2,'Other': 43}

new_dict   = {} # { (name, value, name(lowercase))}
keys_list = [] # all keys with lowercase

for index,name in enumerate(data):
    new_dict[index] = (name, data[name], name.lower())
    keys_list.append(name.lower())

not_repeated_keys = [] # [key, key, key, ...etc]
repeated          = [] # [[key, value], [key, value], ...]
final_data        = [] # final data in list format [[key, value], [key, value], ...]

for index, name in enumerate(keys_list):
    if name not in not_repeated_keys:
        not_repeated_keys.append(name)
        final_data.append([name,new_dict[index][1]]) # [key, value]
    else:
        repeated.append([name, new_dict[index][1]])  # [key, value]
        
for pair in final_data:
    for rep in repeated:
        # if the same name
        if pair[0] == rep[0]:
            # sum the values 
            pair[1] = pair[1] + rep[1]


result = {}

for x in final_data:
    result[x[0]] = x[1]
           
print("Final dict: ", result, "\n")

https://onlinegdb.com/nkKgw_b-g

Upvotes: 0

Lucas Roberts
Lucas Roberts

Reputation: 1343

How you solve this really depends on how data is structured-is it a list, a dictionary, or a string? Here I'll assume the data are in a dict() which seems the most likely given the data are like:

JavaScript: 3
C#: 9
Visual studio: 2
Docker: 4       
Azure: 4        
AngularJs: 2
Java: 3
Visual Studio: 5

It seems like the problem is solely one of mixed-case characters. If you convert all to lowercase you'll get some collisions that you want to aggregate. Here is one way:

tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}

consolidated = dict()

for item in tech_count.items():
    norm_key = item[0].lower()
    if norm_key not in consolidated:
        consolidated[norm_key] = item[1]
    else:
        consolidated[norm_key] += item[1]

print(consolidated)

or if you want to do this succinctly as suggested by @juanpa.arrivillaga then you could do it

tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}

consolidated = dict()

for item in tech_count.items():
    norm_key = item[0].lower()
    consolidated[norm_key] = consolidated.get(norm_key, 0) + item[1]

print(consolidated)

A more specialized data structure for this sort of thing is the collections.Counter which ships with python. One benefit to the counter is that querying for keys you have not yet seen will return 0 values which can make for fewer edge case considerations.

With counter one way would look like this:

from collections import Counter
tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}

consolidated = Counter()

for item in tech_count.items():
    norm_key = item[0].lower()
    consolidated[norm_key] += item[1]

print(consolidated)
consolidated['assembly'] # returns 0 

Now consolidated will have the sum of the counts from the colliding key-value pairs in the original dictionary. If there are more similar transformations on the keys you could write a separate function that takes a string as input and replace the item[0].lower() keys.

Upvotes: 2

Lautaro Ariel Araujo
Lautaro Ariel Araujo

Reputation: 11

If you were able to fundamentally standarize the same word (with different capital letters) you should be able to properly "condense" the dictionary. How can we achieve this? Simple, you could make every key value lowercase when building your dictionary:

# Initialize Dictionary with Keys
for item in keys:
    techCount[item.lower()] = 0

# Load Values into Dictionary
for item in data:
    techCount[item.lower()] += 1

Upvotes: 1

Related Questions