Reputation: 7
I am just a real novice when it comes to python but I am really enjoying the learning process. I am interested in data analysis and all I am trying to do is scrape list elements of a wikipedia page.
I have managed to pull the list and prettify it via:
music_box = soup.find(class_="div-col")
music_rows = music_box.find_all("li")
for row in music_rows:
print(row.prettify())
Which then provides me with:
<li>
Natalie Beridze – When Dreams Become Responsibility
</li>
<li>
<a href="/wiki/The_Specials" title="The Specials">
The Specials
</a>
– Do Nothing
</li>
<li>
<a href="/wiki/Nine_Inch_Nails" title="Nine Inch Nails">
Nine Inch Nails
</a>
– Your Touch
</li>
<li>
<a href="/wiki/Phosphorescent_(band)" title="Phosphorescent (band)">
Phosphorescent
</a>
– Song for Zula
And essentially I would like to create a function that pulls just the artist and the song. Ideally popping into a list or dictionary.
I think I was getting mixed up though because I was thinking this would work
music_info = {}
for index, row in enumerate(music_rows):
if index == 0:
music_info['artist'] = row.find("li").get_text("", strip=True)
(music_info)
I am really new to python and have been trying self teach myself and perhaps i haven't quite grasped the fundamentals just yet and If this is something you guys believe i should have known prior to asking you guys just say, this community has been so good to me from joining this year so any advice whether constructive or not is appreciated, thanks guys! Hope y'all are having a happy friday
Upvotes: 0
Views: 63
Reputation: 26
You don't need to use row.find("li")
to select <li>
again, as you have selected them in music_rows = music_box.find_all("li")
. row.find("li")
will find <li>
in row
's children, not itself. You can try this one:
for index, row in enumerate(music_rows):
music_info = {}
music_info["artist"], music_info["name"] = map(str.strip, row.get_text("", strip=True).split("–"))
print(music_info)
Upvotes: 1