Reputation: 1
I want to use Python to get all the tables on the website 'https://www.tgju.org/archive/price_dollar_rl'
and I write:
import requests
import pandas as pd
url = 'https://www.tgju.org/archive/price_dollar_rl'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_csv('myy data.csv')
But only one of the 95 tables is saved. What should I do to save all the tables? //
Upvotes: 0
Views: 128
Reputation: 195553
To get all pages, you can simulate Ajax request and load data directly from API:
import requests
import pandas as pd
from bs4 import BeautifulSoup
query = {
"lang": "fa",
"order_dir": ["asc", ""],
"draw": "9",
"columns[0][data]": "0",
"columns[0][name]": "",
"columns[0][searchable]": "true",
"columns[0][orderable]": "true",
"columns[0][search][value]": "",
"columns[0][search][regex]": "false",
"columns[1][data]": "1",
"columns[1][name]": "",
"columns[1][searchable]": "true",
"columns[1][orderable]": "true",
"columns[1][search][value]": "",
"columns[1][search][regex]": "false",
"columns[2][data]": "2",
"columns[2][name]": "",
"columns[2][searchable]": "true",
"columns[2][orderable]": "true",
"columns[2][search][value]": "",
"columns[2][search][regex]": "false",
"columns[3][data]": "3",
"columns[3][name]": "",
"columns[3][searchable]": "true",
"columns[3][orderable]": "true",
"columns[3][search][value]": "",
"columns[3][search][regex]": "false",
"columns[4][data]": "4",
"columns[4][name]": "",
"columns[4][searchable]": "true",
"columns[4][orderable]": "true",
"columns[4][search][value]": "",
"columns[4][search][regex]": "false",
"columns[5][data]": "5",
"columns[5][name]": "",
"columns[5][searchable]": "true",
"columns[5][orderable]": "true",
"columns[5][search][value]": "",
"columns[5][search][regex]": "false",
"columns[6][data]": "6",
"columns[6][name]": "",
"columns[6][searchable]": "true",
"columns[6][orderable]": "true",
"columns[6][search][value]": "",
"columns[6][search][regex]": "false",
"columns[7][data]": "7",
"columns[7][name]": "",
"columns[7][searchable]": "true",
"columns[7][orderable]": "true",
"columns[7][search][value]": "",
"columns[7][search][regex]": "false",
"start": "0",
"length": "30",
"search": "",
"order_col": "",
"from": "",
"to": "",
"convert_to_ad": "1",
# "_": "1624699477042"
}
url = "https://api.accessban.com/v1/market/indicator/summary-table-data/price_dollar_rl"
out = []
for start in range(0, 10): # <-- increase number of pages here
print("Getting page {}...".format(start))
query["start"] = start * 30
data = requests.get(url, params=query).json()
out.extend(data["data"])
df = pd.DataFrame(out)
df[4] = df[4].apply(lambda x: BeautifulSoup(x, "html.parser").text)
df[5] = df[5].apply(lambda x: BeautifulSoup(x, "html.parser").text)
print(df)
df.to_csv("data.csv", index=False)
Prints:
0 1 2 3 4 5 6 7
0 241,690 241,190 242,440 241,890 100 0.04% 2021/06/24 1400/04/3
1 243,310 240,790 243,340 241,790 880 0.36% 2021/06/23 1400/04/2
2 241,190 241,190 243,140 242,670 1680 0.7% 2021/06/22 1400/04/1
3 239,940 239,190 241,440 240,990 1390 0.58% 2021/06/21 1400/03/31
4 234,810 234,690 240,440 239,600 4710 2.01% 2021/06/20 1400/03/30
5 244,490 234,690 244,640 234,890 9400 4% 2021/06/19 1400/03/29
6 242,010 241,950 244,640 244,290 2540 1.05% 2021/06/17 1400/03/27
7 240,470 239,450 242,250 241,750 1260 0.52% 2021/06/16 1400/03/26
8 239,970 239,950 240,050 240,490 770 0.32% 2021/06/15 1400/03/25
9 240,970 238,550 241,050 239,720 1310 0.55% 2021/06/14 1400/03/24
10 238,970 238,940 241,250 241,030 3280 1.38% 2021/06/13 1400/03/23
11 236,830 236,140 238,350 237,750 1480 0.62% 2021/06/12 1400/03/22
12 240,010 239,140 240,450 239,230 2210 0.92% 2021/06/10 1400/03/20
...
And saves data.csv
(screenshot from LibreOffice):
Upvotes: 1
Reputation: 1186
Well, first of all, there is a difference between the url in the text and the url in the code.
Second, the site uses pagination so you'd need to use something like selenium to press the next button on the site automatically through a script, and fetch the html to then convert it to csv
Upvotes: 0