vikhaf
vikhaf

Reputation: 9

How to count words in a list?

from bs4 import BeautifulSoup
import urllib2
# Imported libraries for future use.
response = urllib2.urlopen('http://www.nytimes.com').read()
soup = BeautifulSoup(response,"lxml")

host = []
#created empty list to append future words extracted from data set.
for story_heading in soup.find_all(class_="story-heading"):
    story_title = story_heading.text.replace("\n", " ").strip()
    new_story_title = story_title.encode('utf-8')


    parts = new_story_title.split()[0]

    i=['a','A','an','An','the','The','from','From','to','To','when','When','what','What','on','On','for','For']
    if parts not in i:
        host.append(parts)
    else:
        pass
#now i have to calculate the number of repeated words in the file and calcute the number of repeatation.    
print host

let me know how to calculate the number of repeated words in list we created. actually am pretty confused about the above code too. if anyone can explain what I did mistake in it, it would be grateful.

Upvotes: 0

Views: 141

Answers (5)

Ani Menon
Ani Menon

Reputation: 28277

Use:

lst = ['hi', 'Hio', 'Hi', 'hello', 'there' ]
s = set()
map(lambda x: s.add(x.lower()), lst)
print(len(s))

OR

lst = ['hi', 'Hio', 'Hi', 'hello', 'there' ]
s = set()
for item in lst:
    s.add(item.lower())
print(len(s))

Upvotes: 0

Nagaraj
Nagaraj

Reputation: 33

You can see the below code snippet which does not use list comprehension. I feel this should be simple to understand.

host = ['Hello','foo','bar','World','foo','Hello']
dict1 = {}
host_unique = list(set(host))
for i in host_unique:
    dict[i] = host.count(i)

Upvotes: 0

Byte Commander
Byte Commander

Reputation: 6776

Using a dictionary comprehension iterating over a set of the elements:

  • case-sensitive version ("What" != "what"):

    occurrences = { item: host.count(item) for item in set(host) }
    
  • case-insensitive version ("What" == "what"):

    occurrences = { item: host.count(item) for item in set(item.lower() for item in host) }
    

    The dictionary keys will also be the lowercase elements in this case.

Upvotes: 1

salomonderossi
salomonderossi

Reputation: 2188

You can do that with count

d = {i: host.count(i) for i in set(host)}
print(d)

Upvotes: 1

EbraHim
EbraHim

Reputation: 2359

Use Counter method in collections module:

from bs4 import BeautifulSoup
from collections import Counter
import urllib2
# Imported libraries for future use.
response = urllib2.urlopen('http://www.nytimes.com').read()
soup = BeautifulSoup(response,"lxml")

host = []
#created empty list to append future words extracted from data set.
for story_heading in soup.find_all(class_="story-heading"):
    story_title = story_heading.text.replace("\n", " ").strip()
    new_story_title = story_title.encode('utf-8')


    parts = new_story_title.split()[0]

    i=['a','A','an','An','the','The','from','From','to','To','when','When','what','What','on','On','for','For']
    if parts not in i:
        host.append(parts)
    else:
        pass
#now i have to calculate the number of repeated words in the file and calcute the number of repeatation.    
print Counter(host)

Output:

>>> ================================ RESTART ================================
>>> 
Counter({'North': 2, 'Trump': 1, 'U.S.': 1, 'Kasich-Cruz': 1, '8': 1, 'Court': 1, 'Where': 1, 'Your': 1, 'Forget': 1})
>>> 

Upvotes: 2

Related Questions