Reputation: 669
I'm trying to scrape the table data from this website: https://www.playnj.com/atlantic-city/revenue/
Yet when I try to print the table, it returns None. Can someone assist me with this?
Here is my code:
from bs4 import BeautifulSoup
import requests
import pandas as pd
base_url = 'https://www.playnj.com/atlantic-city/revenue/'
resp = requests.get(base_url)
soup = BeautifulSoup(resp.text, "html.parser")
october_table = soup.find('table', {'id': 'tablepress-342-no-2'})
print(october_table)
This returns None and I am unsure as to why - Ideally (and perhaps I am wrong here) - If my objective is to get ALL the data from ALL the tables it is more efficient to use the same class wrapper as all the tables and I would use the following 2 lines instead (but maybe not).
all_tables = soup.findAll('table', {'class': 'dataTables_wrapper no-footer'})
print(all_tables)
However this also returns None. Any help here would be immensely appreciated.
Upvotes: 0
Views: 210
Reputation: 11505
import pandas as pd
import requests
headers = {"User-Agent": "Mozilla/5.0"}
df = pd.read_html(requests.get(
"https://www.playnj.com/atlantic-city/revenue/", headers=headers).text)[0]
df.to_csv("out.csv", index=False)
Output:
Casino Table & Other Poker Slot Machines Total Gaming Win
0 Bally's $3,441,617 $183,255 $9,780,559 $13,405,431
1 Borgata $16,744,564 $1,631,575 $40,669,801 $59,045,940
2 Caesars $13,785,260 $ - $14,530,482 $28,315,742
3 Golden Nugget $5,237,258 $92,647 $11,728,116 $17,058,021
4 Hard Rock $7,155,391 $ - $16,338,090 $23,493,481
5 Harrah's $5,555,330 $222,323 $19,794,846 $25,572,499
6 Ocean Resort $4,965,900 $82,686 $14,459,903 $19,508,489
7 Resorts $3,328,916 $ - $10,566,342 $13,895,258
8 Tropicana $4,531,234 $159,957 $18,957,670 $23,648,861
9 Total $64,745,470 $2,372,443 $156,825,809 $223,943,722
CSV File: view-online
Upvotes: 2
Reputation: 142641
It seems this page check User-Agent
header.
It works even with incomplete "User-Agent": "Mozilla/5.0"
BTW: this table has different ID: 'id': 'tablepress-342'
import requests
from bs4 import BeautifulSoup
url = 'https://www.playnj.com/atlantic-city/revenue/'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
print(r.status_code)
soup = BeautifulSoup(r.text, "html.parser")
october_table = soup.find('table', {'id': 'tablepress-342'})
#print(october_table)
for row in october_table.find_all('tr'):
for item in row.find_all('td'):
print(item.text)
print('---')
Result
200
---
Bally's
$3,799,907
$180,229
$9,107,610
$13,087,746
---
Borgata
$14,709,145
$1,060,246
$35,731,777
$51,501,168
---
Caesars
$7,097,502
$ -
$14,689,045
$21,786,547
---
Golden Nugget
$3,311,223
$84,387
$11,356,285
$14,751,895
---
Hard Rock
$7,849,617
$ -
$16,619,183
$24,468,800
---
Harrah's
$4,507,262
$205,921
$19,372,672
$24,085,855
---
Ocean Resort
$5,116,397
$65,276
$13,245,998
$18,427,671
---
Resorts
$2,257,149
$ -
$9,859,813
$12,116,962
---
Tropicana
$4,377,139
$152,876
$17,501,139
$22,031,154
---
Total
$53,025,341
$1,748,935
$147,483,522
$202,257,798
---
Upvotes: 1
Reputation: 12255
Request with headers:
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:72.0) Gecko/20100101 Firefox/72.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'ru-RU,ru;q=0.8,en;q=0.6,en-US;q=0.4,tr;q=0.2',
'DNT': '1',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
}
resp = requests.get('https://www.playnj.com/atlantic-city/revenue/', headers=headers)
soup = BeautifulSoup(resp.text, "html.parser")
tables = soup.select('table.tablepress')
Upvotes: 1