Reputation: 59
I currently have a dictionary that has several keys that are similar but are formatted differently (Visual Studio, Visual studio / JavaScript,Javascript,javascript).
How would I condense the dictionary so there's only one of a certain key, (Visual Studio, JavaScript, etc.) rather than the above example?
Note: Elements such as Vue and Vue.js are meant to be separate keys.
Is there something obvious that I'm missing?
Code for reference
def getVal(keys, data):
techCount = dict()
other = 0
remList = []
# Initialize Dictionary with Keys
for item in keys:
techCount[item] = 0
# Load Values into Dictionary
for item in data:
techCount[item] += 1
# Creates the 'Other' field
for key, val in techCount.items():
if val <= 1:
other += 1
remList.append(key)
techCount['Other'] = other
# Remove Redundant Keys
for item in remList:
techCount.pop(item)
# Sort the Dictionary
techCount = {key: val for key, val in sorted(
techCount.items(), key=lambda ele: ele[1])}
# Break up the Data
keys = techCount.keys()
techs = techCount.values()
return keys, techs
Full List:
JavaScript: 3
C#: 9
Visual studio: 2
Docker: 4
Azure: 4
AngularJs: 2
Java: 3
Visual Studio: 5
SQL: 4
Javascript: 5
Typescript: 3
AngularJS: 3
WordPress: 2
Zoho: 3
Drupal: 2
CSS: 9
.NET: 3
Python: 6
ReactJS: 3
HTML: 8
ASP.NET: 2
PHP: 2
Jira: 2
Other: 43
Upvotes: 1
Views: 190
Reputation: 165
It is basically what has already been said. Unify the keys by converting them to lowercase and then adding the values of the repeated keys.
data = {'JavaScript': 3,'C#': 9,'Visual studio': 2,'Docker': 4, 'Azure': 4,'AngularJs': 2,'Java': 3,'Visual Studio': 5,'SQL': 4,'Javascript': 5,'Typescript': 3,'AngularJS': 3,'WordPress': 2,'Zoho': 3,'Drupal': 2,'CSS': 9,'.NET': 3,'Python': 6,'ReactJS': 3,'HTML': 8,'ASP.NET': 2,'PHP': 2,'Jira': 2,'Other': 43}
new_dict = {} # { (name, value, name(lowercase))}
keys_list = [] # all keys with lowercase
for index,name in enumerate(data):
new_dict[index] = (name, data[name], name.lower())
keys_list.append(name.lower())
not_repeated_keys = [] # [key, key, key, ...etc]
repeated = [] # [[key, value], [key, value], ...]
final_data = [] # final data in list format [[key, value], [key, value], ...]
for index, name in enumerate(keys_list):
if name not in not_repeated_keys:
not_repeated_keys.append(name)
final_data.append([name,new_dict[index][1]]) # [key, value]
else:
repeated.append([name, new_dict[index][1]]) # [key, value]
for pair in final_data:
for rep in repeated:
# if the same name
if pair[0] == rep[0]:
# sum the values
pair[1] = pair[1] + rep[1]
result = {}
for x in final_data:
result[x[0]] = x[1]
print("Final dict: ", result, "\n")
https://onlinegdb.com/nkKgw_b-g
Upvotes: 0
Reputation: 1343
How you solve this really depends on how data
is structured-is it a list, a dictionary, or a string? Here I'll assume the data are in a dict()
which seems the most likely given the data are like:
JavaScript: 3
C#: 9
Visual studio: 2
Docker: 4
Azure: 4
AngularJs: 2
Java: 3
Visual Studio: 5
It seems like the problem is solely one of mixed-case characters. If you convert all to lowercase you'll get some collisions that you want to aggregate. Here is one way:
tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}
consolidated = dict()
for item in tech_count.items():
norm_key = item[0].lower()
if norm_key not in consolidated:
consolidated[norm_key] = item[1]
else:
consolidated[norm_key] += item[1]
print(consolidated)
or if you want to do this succinctly as suggested by @juanpa.arrivillaga then you could do it
tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}
consolidated = dict()
for item in tech_count.items():
norm_key = item[0].lower()
consolidated[norm_key] = consolidated.get(norm_key, 0) + item[1]
print(consolidated)
A more specialized data structure for this sort of thing is the collections.Counter
which ships with python. One benefit to the counter is that querying for keys you have not yet seen will return 0
values which can make for fewer edge case considerations.
With counter one way would look like this:
from collections import Counter
tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}
consolidated = Counter()
for item in tech_count.items():
norm_key = item[0].lower()
consolidated[norm_key] += item[1]
print(consolidated)
consolidated['assembly'] # returns 0
Now consolidated will have the sum of the counts from the colliding key-value pairs in the original dictionary. If there are more similar transformations on the keys you could write a separate function that takes a string as input and replace the item[0].lower()
keys.
Upvotes: 2
Reputation: 11
If you were able to fundamentally standarize the same word (with different capital letters) you should be able to properly "condense" the dictionary. How can we achieve this? Simple, you could make every key value lowercase when building your dictionary:
# Initialize Dictionary with Keys
for item in keys:
techCount[item.lower()] = 0
# Load Values into Dictionary
for item in data:
techCount[item.lower()] += 1
Upvotes: 1