Xavier
Xavier

Reputation: 87

Python: unnable to get any output using beautifulsoup

I am trying to scrape some words from any random website, but the following program is not showing errors and not showing any output when i tried printing the results.

I have checked the code twice and even incorporated an if statement to see whether the program is getting any words or not.


    import requests
    import operator
    from bs4 import BeautifulSoup
        
        
    def word_count(url):

        wordlist = []

        source_code = requests.get(url)

        source = BeautifulSoup(source_code.text, features="html.parser")

        for post_text in source.findAll('a', {'class':'txt'}):
            word_string=post_text.string

            if word_string is not None:
                word = word_string.lower().split()

                for each_word in word:
                    print(each_word)
                    wordlist.append(each_word)

                else:
                    print("None")
    
    word_count('https://mumbai.craigslist.org/')

I am expecting all the words under the "class= txt" to be displayed in the output.

Upvotes: 3

Views: 101

Answers (3)

recnac
recnac

Reputation: 3744

I have visited https://mumbai.craigslist.org/, and find there is no <a class="txt">, only <span class="txt">, so I think you can try this:

def word_count(url):
    wordlist = []
    source_code = requests.get(url)
    source=BeautifulSoup(source_code.text, features="html.parser")
    for post_text in source.findAll('span', {'class':'txt'}):
        word_string=post_text.text
        if word_string is not None:
            word = word_string.lower().split ()
            for each_word in word:
                print(each_word)
                wordlist.append(each_word)
         else:
            print("None")

it will output correctly:

community
activities
artists
childcare
classes
events
general
...

Hope that helps you, and comment if you have further questions. : )

Upvotes: 2

DirtyBit
DirtyBit

Reputation: 16782

OP: I am expecting all the words of the class text to be displayed in the output

The culprit:

for post_text in source.findAll('a', {'class':'txt'}):

The reason:

anchor tag has no class txt but the span tag inside it does.

Hence:

import requests
from bs4 import BeautifulSoup

def word_count(url):
    source_code = requests.get(url)
    source=BeautifulSoup(source_code.text, features="html.parser")

    for post_text in source.findAll('a'):
        s_text = post_text.find('span', class_ = "txt")
        if s_text is not None:
            print(s_text.text)

word_count('https://mumbai.craigslist.org/')

OUTPUT:

community
activities
artists
childcare
classes
events
general
groups
local news
lost+found
missed connections
musicians
pets
.
.
.

Upvotes: 3

Iakovos Belonias
Iakovos Belonias

Reputation: 1373

You are targeting the wrong elements.

if you use

print(source)

Everything works fine but the moment you try to target the element with findAll you are targeting something wrong because you get an empty list array.

If you replace

for post_text in source.findAll('a', {'class':'txt'}):

with

for post_text in source.find_all('a'):

everyting seems to work fine

Upvotes: 2

Related Questions