Akrej
Akrej

Reputation: 31

python beautiful soup table?

please how I catch the values here in the table.I need date, time, reserve and play values. Each time I only got a whole list of the whole table, I don't know how to catch the given values in it thank you very much for your help.

 <table class="list">
    <tr class="head">
        <th>Date</th>
        <th>Time</th>
        <th>Play</th>
        <th>Tickets</th>
        <th>&nbsp;</th>
    </tr>
    

        
        

            <tr class="t1">
                <th>Th
                    03. 09. 2020</th>
                <td>
                    19:00</td>
                <td>Racek</td>
                <td class="center">4</td>
                <td>
                    
                    
                        
                            <a href="/rezervace/detail?id=2618"
                               title="Reserve tickets for this performance">
                                reserve
                            </a>
                        
                        
                    

                </td>
            </tr>

Upvotes: 1

Views: 59

Answers (3)

Wazaa
Wazaa

Reputation: 146

A simple way using pandas

import pandas as pd

table = """
<table class="list">
    <tr class="head">
        <th>Date</th>
        <th>Time</th>
        <th>Play</th>
        <th>Tickets</th>
        <th>&nbsp;</th>
    </tr>
    <tr class="t1">
        <th>Th
            03. 09. 2020</th>
        <td>
            19:00</td>
        <td>Racek</td>
        <td class="center">4</td>
        <td>
            <a href="/rezervace/detail?id=2618" title="Reserve tickets for this performance">
                reserve
            </a>
        </td>
    </tr>
</table>
""""

df = pd.read_html(table)[0]

Then you can access the data within "df"

df["Date"]
# 0    Th  03. 09. 2020
# Name: Date, dtype: object
df["Time"]
# 0    19:00
# Name: Time, dtype: object
df["Play"]
# 0    Racek
# Name: Play, dtype: object
df["Tickets"]
# 0    4

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195428

This script will parse the table with BeautifulSoup and then print individual rows to screen:

import re
from bs4 import BeautifulSoup

html = '''
<table class="list">
     <tr class="head">
          <th>Date</th>
          <th>Time</th>
          <th>Play</th>
          <th>Tickets</th>
          <th>&nbsp;</th>
     </tr>
     <tr class="t1">
          <th>Th
          03. 09. 2020</th>
          <td>
          19:00</td>
          <td>Racek</td>
          <td class="center">4</td>
          <td>
               <a href="/rezervace/detail?id=2618"
                    title="Reserve tickets for this performance">
                    reserve
               </a>
          </td>
     </tr>
</table>
'''

soup = BeautifulSoup(html, 'html.parser')

all_data = []
for row in soup.select('tr'):
    all_data.append([re.sub(r'\s{2,}', ' ', d.get_text(strip=True)) for d in row.select('td, th')])

# print data to screen:

# print header:
print('{:<25}{:<15}{:<15}{:<15}{:<15}'.format(*all_data[0]))

# print rows:
for date, time, play, tickets, reserve in all_data[1:]:
    print('{:<25}{:<15}{:<15}{:<15}{:<15}'.format(date, time, play, tickets, reserve))

Prints:

Date                     Time           Play           Tickets                       
Th 03. 09. 2020          19:00          Racek          4              reserve        

Upvotes: 0

dabingsou
dabingsou

Reputation: 2469

First, you should post some code that you've tried yourself. But anyway, here's another way for you.

from simplified_scrapy import SimplifiedDoc,req
html = '''
<table class="list">
     <tr class="head">
          <th>Date</th>
          <th>Time</th>
          <th>Play</th>
          <th>Tickets</th>
          <th>&nbsp;</th>
     </tr>
     <tr class="t1">
          <th>Th
          03. 09. 2020</th>
          <td>
          19:00</td>
          <td>Racek</td>
          <td class="center">4</td>
          <td>
               <a href="/rezervace/detail?id=2618"
                    title="Reserve tickets for this performance">
                    reserve
               </a>
          </td>
     </tr>
</table>
'''
doc = SimplifiedDoc(html)
# First method
table = doc.getTable('table')
print (table)

# Second method
table = doc.getElement('table', attr='class', value='list').trs.children.text
print (table)

Result:

[['Date', 'Time', 'Play', 'Tickets', ''], ['Th 03. 09. 2020', '19:00', 'Racek', '4', 'reserve']]
[['Date', 'Time', 'Play', 'Tickets', ''], ['Th 03. 09. 2020', '19:00', 'Racek', '4', 'reserve']]

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

Upvotes: 1

Related Questions