Reputation: 109
I would like to take the dates and setlists from this page https://www.zappateers.com/fzshows/78.html and then organize it so it looks like this:
1978 08 26 The Purple Lagoon
1978 08 26 Dancin' Fool
1978 08 26 Easy Meat
1978 08 26 Honey Don't You Want A Man Like Me?
etc. so the date of the show and the song played appear next to each other in columns.
I have this python code
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.zappateers.com/fzshows/78.html').text
soup = BeautifulSoup(source, 'lxml')
for heading in soup.find_all([ "h4", "p"]):
print(heading.text.strip())
that extracts the h4 tag (the date and venue) and the p tag (setlists).
But it just prints it out like this:
1978 08 26 - Festplatz Friedrichsau, Ulm, Germany Parts of the show were pro-shot, see "Ein Leben als Extravaganza". The Purple Lagoon, Dancin' Fool, Easy Meat, Honey Don't You Want A Man Like Me?,
What can I do to organize it in columns as indicated above? Thank you
Upvotes: 1
Views: 566
Reputation: 5531
This should help you:
import requests
from bs4 import BeautifulSoup
import pandas as pd
source = requests.get('https://www.zappateers.com/fzshows/78.html').text
soup = BeautifulSoup(source, 'lxml')
text = []
dates = []
headings = soup.find_all('h4')
for index,p in enumerate(soup.find_all('p', class_ = "setlist")):
h = headings[index]
for x in range(len(p.text.strip().split(','))):
if '-' in h.text:
dates.append(h.text.strip().split('-')[0].strip())
if '-' in h.text:
text.append(p.text.strip().split(','))
text = [item for sublist in text for item in sublist]
df = pd.DataFrame([dates,text]).T
df.columns = ['Date','Title']
print(df)
Output:
Date Title
0 1978 08 26 The Purple Lagoon
1 1978 08 26 Dancin' Fool
2 1978 08 26 Easy Meat
3 1978 08 26 Honey Don't You Want A Man Like Me?
4 1978 08 26 Keep It Greasey
.. ... ...
462 1978 10 31 Suicide Chump
463 1978 10 31 Improvisations In Q*
464 1978 10 31 Why Does It Hurt When I Pee?
465 1978 10 31 Improvisations (incl. Hail Caesar Theme)
466 1978 10 31 Magic Fingers
[467 rows x 2 columns]
You can also send these details to a csv
file by adding this line to your code:
df.to_csv('D:\\Songs.csv',index = False)
Screenshot of csv
file:
Upvotes: 2