aleee
aleee

Reputation: 120

extract text from specific sections in html, python

I'm trying to do a program that show uou lyrics of a song, but i get stuck on this error:

AttributeError: 'NoneType' object has no attribute 'text'

here's the code:

def get_lyrics(url):
    lyrics_html = requests.get(url)
    soup = BeautifulSoup(lyrics_html.content, "html.parser")
    lyrics = soup.find('div', {"class": "lyrics"})
    return lyrics.text

This is the site where i take the lyrics. I can't explain whats wrong, for example i'll search the lyrics of this song, so here's the lyrics of the song: click. You can see from yourself that in the page the "place" where the lyrics is, a div with class "lyrics". This is how all lyrics pages of this site are made. Can someone help me pls? Ty

Upvotes: 1

Views: 258

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195408

The page returns two versions of page (probably to confuse scrapers and bots). One version with class that begins on "Lyrics__Container..." and one with class lyrics. If a tag with class Lyrics__Container is not found, the lyrics are inside the tag with class lyrics.

This should always print a lyrics:

import requests
from bs4 import BeautifulSoup


url = 'https://genius.com/Luis-sal-ciao-mi-chiamo-luis-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

text = soup.select_one('div[class^="Lyrics__Container"], .lyrics').get_text(strip=True, separator='\n')
print(text)

Prints:

[Intro]
Ah, mhh (ehi)
Ho la bocca piena
Va bene
[Verse]
Ciao, mi chiamo Luis (eh, eh-eh)
Ciao, mi chiamo Luis (eh, eh-eh)
Ciao, Ciao mi chiamo Luis (eh, eh-eh)
Ciao, mi chiamo Luis
Si, si, si Sal
A a a a Si si si si si si
Proprio così mi chiamo io
Ciao mi chiamo Luis Aah

... and so on.

EDIT: Updated version:

import requests
from bs4 import BeautifulSoup


url = 'https://genius.com/Avicii-the-nights-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

def get_text(elements):
    text = ''
    for c in elements:
        for t in c.select('a, span'):
            t.unwrap()
        if c:
            c.smooth()
            text += c.get_text(strip=True, separator='\n')
    return text


cs = soup.select('div[class^="Lyrics__Container"]')
if cs:
    text = get_text(cs)
else:
    text = get_text(soup.select('.lyrics'))

print(text)

Prints:

[Verse 1]
(Hey)
Once upon a younger year
When all our shadows disappeared
The animals inside came out to play (Hey)
Hey, went face to face with all our fears
Learned our lessons through the tears
Made memories we knew would never fade
[Pre-Chorus]
One day my father he told me
Son, don't let it slip away

...etc.

Upvotes: 2

Humayun Ahmad Rajib
Humayun Ahmad Rajib

Reputation: 1560

You should use this link https://genius.com/Luis-sal-ciao-mi-chiamo-luis-lyrics instead of https://genius.com/ which you have mentioned as song.

def get_lyrics(url):
    lyrics_html = requests.get(url)
    soup = BeautifulSoup(lyrics_html.text, "lxml")
    lyrics_text = []
    lyrics = soup.find_all('div', class_="Lyrics__Container-sc-1ynbvzw-2 jgQsqn")
    for i in lyrics:
        lyrics_text.append(i.text.strip())
        # print(i.text.strip())
    return lyrics_text

output = get_lyrics("https://genius.com/Luis-sal-ciao-mi-chiamo-luis-lyrics")

Output will be:

['[Intro]Ah, mhh (ehi)Ho la bocca pienaVa bene[Verse]Ciao, mi chiamo Luis (eh, eh-eh)Ciao, mi chiamo Luis (eh, eh-eh)Ciao, Ciao mi chiamo Luis (eh, eh-eh)Ciao, mi chiamo LuisSi, si, si SalA a a a Si si si si si siProprio così mi chiamo ioCiao mi chiamo Luis AahLuis Sal, Luis, Luis, Luis SalCiao mi chiamo Luis, Luis SalEeemEeeCiao, Ciao BolognaMi chiamo LuisCiao Mamma (Eee) EeeCiao, Ciao anche a voi LuistiMi chiamo Luis, Lo youtuber EeeEeeCiao, Sono uno youtuberMi chiamo LuisSono uno youtuberEeeCiao, Sono uno youtuberMi chiamo LuisSono uno youtuberA e (Diglielo Luis) a e ă a e e a ă a a a-aaaaCiao mi chiamo LuisEee (Ma chi ti caga)Eee Ciao (Ma chi vuoi che ti guardi)Mi chiamo LuisHahahahaEeeVoglio diventare uno youtuberEee', '', '[Outro]Uuu BolognaDuemila EeeEee EeeEe']

Upvotes: 0

Related Questions