Tzuriel
Tzuriel

Reputation: 109

Organizing html data scraped using BeautifulSoup and Python

I would like to take the dates and setlists from this page https://www.zappateers.com/fzshows/78.html and then organize it so it looks like this:

1978 08 26 The Purple Lagoon

1978 08 26 Dancin' Fool

1978 08 26 Easy Meat

1978 08 26 Honey Don't You Want A Man Like Me?

etc. so the date of the show and the song played appear next to each other in columns.

I have this python code

import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.zappateers.com/fzshows/78.html').text
soup = BeautifulSoup(source, 'lxml')    

for heading in soup.find_all([ "h4", "p"]):
    
    print(heading.text.strip())

that extracts the h4 tag (the date and venue) and the p tag (setlists).

But it just prints it out like this:

1978 08 26 - Festplatz Friedrichsau, Ulm, Germany Parts of the show were pro-shot, see "Ein Leben als Extravaganza". The Purple Lagoon, Dancin' Fool, Easy Meat, Honey Don't You Want A Man Like Me?,

What can I do to organize it in columns as indicated above? Thank you

Upvotes: 1

Views: 566

Answers (1)

Sushil
Sushil

Reputation: 5531

This should help you:

import requests
from bs4 import BeautifulSoup
import pandas as pd

source = requests.get('https://www.zappateers.com/fzshows/78.html').text
soup = BeautifulSoup(source, 'lxml')

text = []
dates = []

headings = soup.find_all('h4')

for index,p in enumerate(soup.find_all('p', class_ = "setlist")):
    h = headings[index]
    for x in range(len(p.text.strip().split(','))):
        if '-' in h.text:
            dates.append(h.text.strip().split('-')[0].strip())
    if '-' in h.text:
        text.append(p.text.strip().split(','))

text = [item for sublist in text for item in sublist]

df = pd.DataFrame([dates,text]).T
df.columns = ['Date','Title']
print(df)

Output:

           Date                                      Title
0    1978 08 26                          The Purple Lagoon
1    1978 08 26                               Dancin' Fool
2    1978 08 26                                  Easy Meat
3    1978 08 26        Honey Don't You Want A Man Like Me?
4    1978 08 26                            Keep It Greasey
..          ...                                        ...
462  1978 10 31                              Suicide Chump
463  1978 10 31                       Improvisations In Q*
464  1978 10 31               Why Does It Hurt When I Pee?
465  1978 10 31   Improvisations (incl. Hail Caesar Theme)
466  1978 10 31                              Magic Fingers

[467 rows x 2 columns]

You can also send these details to a csv file by adding this line to your code:

df.to_csv('D:\\Songs.csv',index = False)

Screenshot of csv file:

enter image description here

Upvotes: 2

Related Questions