Reputation: 1191
I have following list:
files_list = ['pic1.jpg', 'pic2.jpg', 'pic3.jpg', 'movie1.mov', 'movie2.mov', 'doc1.pdf', 'doc2.pdf', 'doc3.pdf', 'doc4.pdf']
I want to count the number of items with a particular file extension and store it in a dictionary.
Expected output is:
extn_dict = {'jpg': 3, 'mov': 2, 'pdf': 4}
I'm writing following code:
for item in files_list:
extn_dict[item[-3:]] = count(item) # I understand I should not have count() here but I'm not sure how to count them.
How can I count the number of items in the list with a particular extension?
Upvotes: 0
Views: 1401
Reputation: 901
>>> from collections import Counter
>>> files_list
['pic1.jpg', 'pic2.jpg', 'pic3.jpg', 'movie1.mov', 'movie2.mov', 'doc1.pdf', 'doc2.pdf', 'doc3.pdf', 'doc4.pdf']
>>> c = Counter(x.split(".")[-1] for x in files_list)
>>> c
Counter({'pdf': 4, 'jpg': 3, 'mov': 2})
>>>
Upvotes: 12
Reputation: 47169
The easiest way is probably:
>>> d = {}
>>> for item in files_list:
... d[item[-3:]] = d.get(item[-3:], 0) + 1
...
>>> d
{'pdf': 4, 'mov': 2, 'jpg': 3}
Upvotes: 2
Reputation: 4213
files_list = ['pic1.jpg', 'pic2.jpg', 'pic3.jpg', 'movie1.mov', 'movie2.mov', 'doc1.pdf', 'doc2.pdf', 'doc3.pdf', 'doc4.pdf']
extension_set = [i.split('.')[-1] for i in files_list]
d = {j:extension_set.count(j) for j in extension_set}
print(d)
Analysis:
Current method - 10000 loops, best of 3: 25.3 µs per loop
Counter - 10000 loops, best of 3: 30.5 µs per loop(best of 3: 33.3 µs per loop with import statement)
itertools - 10000 loops, best of 3: 41.1 µs per loop(best of 3: 44 µs per loop with import statement)
Upvotes: 1
Reputation: 6376
Using counter and map instead of list comprehension
Counter(map(lambda x : x.split('.')[-1], files_list))
Upvotes: 0
Reputation: 43494
The easiest way is to loop over the list and use a dictionary to store your counts.
files_list = ['pic1.jpg', 'pic2.jpg', 'pic3.jpg', 'movie1.mov',
'movie2.mov', 'doc1.pdf', 'doc2.pdf', 'doc3.pdf', 'doc4.pdf']
counts = {}
for f in f:
ext = f[-3:]
if ext not in counts:
counts[ext] = 0
counts[ext] += 1
print counts
#{'pdf': 4, 'mov': 2, 'jpg': 3}
No doubt, there are other fancy solutions, but I think this is easier to understand.
If you can't assume that extension will always be 3 characters, then you can change the ext =
line to:
ext = f.split(".")[-1]
As other posters have shown in their answers.
Upvotes: 1
Reputation: 300
you can use the Counter function from collection module
from collections import Counter
files_list = ['pic1.jpg', 'pic2.jpg', 'pic3.jpg', 'movie1.mov', 'movie2.mov', 'doc1.pdf', 'doc2.pdf', 'doc3.pdf', 'doc4.pdf']
temp = []
for item in files_list:
temp.append(item[-3:])
print Counter(temp)
>>> Counter({'pdf': 4, 'jpg': 3, 'mov': 2})
Upvotes: 0
Reputation: 71451
You can use itertools.groupby
:
import itertools
files_list = ['pic1.jpg', 'pic2.jpg', 'pic3.jpg', 'movie1.mov', 'movie2.mov', 'doc1.pdf', 'doc2.pdf', 'doc3.pdf', 'doc4.pdf']
final_counts = {a:len(list(b)) for a, b in itertools.groupby(sorted(files_list, key=lambda x:x.split('.')[-1]), key=lambda x:x.split('.')[-1])}
Output:
{'pdf': 4, 'mov': 2, 'jpg': 3}
Upvotes: 0