extract text from specific sections in html, python

Question

I'm trying to do a program that show uou lyrics of a song, but i get stuck on this error:

AttributeError: 'NoneType' object has no attribute 'text'

here's the code:

def get_lyrics(url):
    lyrics_html = requests.get(url)
    soup = BeautifulSoup(lyrics_html.content, "html.parser")
    lyrics = soup.find('div', {"class": "lyrics"})
    return lyrics.text

This is the site where i take the lyrics. I can't explain whats wrong, for example i'll search the lyrics of this song, so here's the lyrics of the song: click. You can see from yourself that in the page the "place" where the lyrics is, a div with class "lyrics". This is how all lyrics pages of this site are made. Can someone help me pls? Ty

Andrej Kesely · Accepted Answer

The page returns two versions of page (probably to confuse scrapers and bots). One version with class that begins on "Lyrics__Container..." and one with class lyrics. If a tag with class Lyrics__Container is not found, the lyrics are inside the tag with class lyrics.

This should always print a lyrics:

import requests
from bs4 import BeautifulSoup


url = 'https://genius.com/Luis-sal-ciao-mi-chiamo-luis-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

text = soup.select_one('div[class^="Lyrics__Container"], .lyrics').get_text(strip=True, separator='
')
print(text)

Prints:

[Intro]
Ah, mhh (ehi)
Ho la bocca piena
Va bene
[Verse]
Ciao, mi chiamo Luis (eh, eh-eh)
Ciao, mi chiamo Luis (eh, eh-eh)
Ciao, Ciao mi chiamo Luis (eh, eh-eh)
Ciao, mi chiamo Luis
Si, si, si Sal
A a a a Si si si si si si
Proprio così mi chiamo io
Ciao mi chiamo Luis Aah

... and so on.

EDIT: Updated version:

import requests
from bs4 import BeautifulSoup


url = 'https://genius.com/Avicii-the-nights-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

def get_text(elements):
    text = ''
    for c in elements:
        for t in c.select('a, span'):
            t.unwrap()
        if c:
            c.smooth()
            text += c.get_text(strip=True, separator='
')
    return text


cs = soup.select('div[class^="Lyrics__Container"]')
if cs:
    text = get_text(cs)
else:
    text = get_text(soup.select('.lyrics'))

print(text)

Prints:

[Verse 1]
(Hey)
Once upon a younger year
When all our shadows disappeared
The animals inside came out to play (Hey)
Hey, went face to face with all our fears
Learned our lessons through the tears
Made memories we knew would never fade
[Pre-Chorus]
One day my father he told me
Son, don't let it slip away

...etc.

extract text from specific sections in html, python

Answers (2)

Related Questions