Reputation: 21
I'm relatively new to Python, actually to programming as a whole. Unfortunately, I have not been able to find an answer to my question on the forum yet.
I have a list with different file extensions, the file extensions occur multiple times. See example here:
extensions = ["JPG", "XLSX", "MP3", "PDF", "EXE", "PY", "XLSX", "DOCX", "JPG", "PPTX"]
I want to create a new list of dictionaries using the above list. It should look like this:
dicts = [{"Extension": "py", "Count": 1}, {"Extension": "docx", "Count": 1}]
My plan is to iterate over the list and to append the file extension to the new list as a new dictionary as shown in the line of code above. If the extension already exists as a dictionary in the list of dictionaries, only the index ["Count"]
of the matching dictionary should be incremented with +=1
. I have written the following code, but it does not work.
I know that the empty extensionlist within the function is one problem, but still I don't get it to work as intended. I would appreciate any help.
extensions = ["JPG", "XLSX", "MP3", "PDF", "EXE", "PY", "XLSX", "DOCX", "JPG", "PPTX"]
def get_extensions(extensions):
extensionlist = []
for item in extensions:
extension = item.lower()
for dictionary in extensionlist:
if dictionary["Extension"] == extension:
dictionary["Count"] += 1
break
else:
extensionlist.append({"Extension": extension, "Count": 1})
break
return extensionlist
test = get_extensions(extensions)
print(test)
Upvotes: 2
Views: 102
Reputation: 1535
You can build the frequency table with a Counter
and then iterate over that to construct your list:
from collections import Counter
extensions = ["JPG", "XLSX", "MP3", "PDF", "EXE", "PY", "XLSX", "DOCX", "JPG", "PPTX"]
frequencies = Counter(extensions)
# Build a list of dicts using a list comprehension. Not
# really sure why you'd want it in this format (rather
# than a dictionary).
output = [
{ "Extension": ext.lower(), "Count": freq }
for ext, freq in frequencies.items()
]
If you wanted to do this "manually" using a for
loop, I'd suggest a similar approach: first construct a dictionary of extension keys to frequency counts, and then construct the list:
frequencies = {}
for extension in extensions:
# d.get(key, default) is like [], except it
# returns default if key is not in d (rather than
# throwing a KeyError).
frequencies[extension] = frequencies.get(extension, 0) + 1
# This is less idiomatic than the list comprehension
# shown above, but it's the same end result.
output = []
for extension, frequency in frequencies.items():
output.append(...)
This is better than your double for
loop because it's one pass over extensions
and then a second pass over frequencies
. Even if your current implementation worked, you're doing a linear scan over the list every time you need to determine whether it already contains a specific extension (so, in the worst-case scenario, you need to check 1, 2, ..., n
elements in the output list for your m
extensions).
Upvotes: 5
Reputation: 616
Your code almost gets right. The problem is that you never reach else statement. Just unindent the else part in your code and its works.
extensions = ["JPG", "XLSX", "MP3", "PDF", "EXE", "PY", "XLSX", "DOCX", "JPG", "PPTX"]
def get_extensions(extensions):
extensionlist = []
for item in extensions:
extension = item.lower()
for dictionary in extensionlist:
if dictionary["Extension"] == extension:
dictionary["Count"] += 1
break
else:
extensionlist.append({"Extension": extension, "Count": 1})
return extensionlist
test = get_extensions(extensions)
print(test)
Upvotes: 2