Reputation: 9
from bs4 import BeautifulSoup
import urllib2
# Imported libraries for future use.
response = urllib2.urlopen('http://www.nytimes.com').read()
soup = BeautifulSoup(response,"lxml")
host = []
#created empty list to append future words extracted from data set.
for story_heading in soup.find_all(class_="story-heading"):
story_title = story_heading.text.replace("\n", " ").strip()
new_story_title = story_title.encode('utf-8')
parts = new_story_title.split()[0]
i=['a','A','an','An','the','The','from','From','to','To','when','When','what','What','on','On','for','For']
if parts not in i:
host.append(parts)
else:
pass
#now i have to calculate the number of repeated words in the file and calcute the number of repeatation.
print host
let me know how to calculate the number of repeated words in list we created. actually am pretty confused about the above code too. if anyone can explain what I did mistake in it, it would be grateful.
Upvotes: 0
Views: 141
Reputation: 28277
Use:
lst = ['hi', 'Hio', 'Hi', 'hello', 'there' ]
s = set()
map(lambda x: s.add(x.lower()), lst)
print(len(s))
OR
lst = ['hi', 'Hio', 'Hi', 'hello', 'there' ]
s = set()
for item in lst:
s.add(item.lower())
print(len(s))
Upvotes: 0
Reputation: 33
You can see the below code snippet which does not use list comprehension. I feel this should be simple to understand.
host = ['Hello','foo','bar','World','foo','Hello']
dict1 = {}
host_unique = list(set(host))
for i in host_unique:
dict[i] = host.count(i)
Upvotes: 0
Reputation: 6776
Using a dictionary comprehension iterating over a set of the elements:
case-sensitive version ("What" != "what"):
occurrences = { item: host.count(item) for item in set(host) }
case-insensitive version ("What" == "what"):
occurrences = { item: host.count(item) for item in set(item.lower() for item in host) }
The dictionary keys will also be the lowercase elements in this case.
Upvotes: 1
Reputation: 2188
You can do that with count
d = {i: host.count(i) for i in set(host)}
print(d)
Upvotes: 1
Reputation: 2359
Use Counter
method in collections module:
from bs4 import BeautifulSoup
from collections import Counter
import urllib2
# Imported libraries for future use.
response = urllib2.urlopen('http://www.nytimes.com').read()
soup = BeautifulSoup(response,"lxml")
host = []
#created empty list to append future words extracted from data set.
for story_heading in soup.find_all(class_="story-heading"):
story_title = story_heading.text.replace("\n", " ").strip()
new_story_title = story_title.encode('utf-8')
parts = new_story_title.split()[0]
i=['a','A','an','An','the','The','from','From','to','To','when','When','what','What','on','On','for','For']
if parts not in i:
host.append(parts)
else:
pass
#now i have to calculate the number of repeated words in the file and calcute the number of repeatation.
print Counter(host)
Output:
>>> ================================ RESTART ================================
>>>
Counter({'North': 2, 'Trump': 1, 'U.S.': 1, 'Kasich-Cruz': 1, '8': 1, 'Court': 1, 'Where': 1, 'Your': 1, 'Forget': 1})
>>>
Upvotes: 2