Reputation: 31
please how I catch the values here in the table.I need date, time, reserve and play values. Each time I only got a whole list of the whole table, I don't know how to catch the given values in it thank you very much for your help.
<table class="list">
<tr class="head">
<th>Date</th>
<th>Time</th>
<th>Play</th>
<th>Tickets</th>
<th> </th>
</tr>
<tr class="t1">
<th>Th
03. 09. 2020</th>
<td>
19:00</td>
<td>Racek</td>
<td class="center">4</td>
<td>
<a href="/rezervace/detail?id=2618"
title="Reserve tickets for this performance">
reserve
</a>
</td>
</tr>
Upvotes: 1
Views: 59
Reputation: 146
A simple way using pandas
import pandas as pd
table = """
<table class="list">
<tr class="head">
<th>Date</th>
<th>Time</th>
<th>Play</th>
<th>Tickets</th>
<th> </th>
</tr>
<tr class="t1">
<th>Th
03. 09. 2020</th>
<td>
19:00</td>
<td>Racek</td>
<td class="center">4</td>
<td>
<a href="/rezervace/detail?id=2618" title="Reserve tickets for this performance">
reserve
</a>
</td>
</tr>
</table>
""""
df = pd.read_html(table)[0]
Then you can access the data within "df"
df["Date"]
# 0 Th 03. 09. 2020
# Name: Date, dtype: object
df["Time"]
# 0 19:00
# Name: Time, dtype: object
df["Play"]
# 0 Racek
# Name: Play, dtype: object
df["Tickets"]
# 0 4
Upvotes: 0
Reputation: 195428
This script will parse the table with BeautifulSoup
and then print individual rows to screen:
import re
from bs4 import BeautifulSoup
html = '''
<table class="list">
<tr class="head">
<th>Date</th>
<th>Time</th>
<th>Play</th>
<th>Tickets</th>
<th> </th>
</tr>
<tr class="t1">
<th>Th
03. 09. 2020</th>
<td>
19:00</td>
<td>Racek</td>
<td class="center">4</td>
<td>
<a href="/rezervace/detail?id=2618"
title="Reserve tickets for this performance">
reserve
</a>
</td>
</tr>
</table>
'''
soup = BeautifulSoup(html, 'html.parser')
all_data = []
for row in soup.select('tr'):
all_data.append([re.sub(r'\s{2,}', ' ', d.get_text(strip=True)) for d in row.select('td, th')])
# print data to screen:
# print header:
print('{:<25}{:<15}{:<15}{:<15}{:<15}'.format(*all_data[0]))
# print rows:
for date, time, play, tickets, reserve in all_data[1:]:
print('{:<25}{:<15}{:<15}{:<15}{:<15}'.format(date, time, play, tickets, reserve))
Prints:
Date Time Play Tickets
Th 03. 09. 2020 19:00 Racek 4 reserve
Upvotes: 0
Reputation: 2469
First, you should post some code that you've tried yourself. But anyway, here's another way for you.
from simplified_scrapy import SimplifiedDoc,req
html = '''
<table class="list">
<tr class="head">
<th>Date</th>
<th>Time</th>
<th>Play</th>
<th>Tickets</th>
<th> </th>
</tr>
<tr class="t1">
<th>Th
03. 09. 2020</th>
<td>
19:00</td>
<td>Racek</td>
<td class="center">4</td>
<td>
<a href="/rezervace/detail?id=2618"
title="Reserve tickets for this performance">
reserve
</a>
</td>
</tr>
</table>
'''
doc = SimplifiedDoc(html)
# First method
table = doc.getTable('table')
print (table)
# Second method
table = doc.getElement('table', attr='class', value='list').trs.children.text
print (table)
Result:
[['Date', 'Time', 'Play', 'Tickets', ''], ['Th 03. 09. 2020', '19:00', 'Racek', '4', 'reserve']]
[['Date', 'Time', 'Play', 'Tickets', ''], ['Th 03. 09. 2020', '19:00', 'Racek', '4', 'reserve']]
Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
Upvotes: 1