Reputation: 10011
I want to iterate and extract tables from the link here, then concatenate or append them to save as a dataframe.
I have used a loop iterate tables but I'm not sure how can I append all json
or dataframe
into one?
Anyone could help? Thank you.
from requests import post
import json
import pandas as pd
import numpy as np
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
"Referer": "http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp"}
dfs = []
#dfs = pd.DataFrame()
for page in range(0, 5):
data = {"limit": 100, "offset": page * 100, "pageNumber": page + 1}
json_arr = requests.post("http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json", headers = headers, data = data).text
d = json.loads(json_arr)
df = pd.read_json(json.dumps(d['rows']) , orient='list')
Reference related: Iterate and extract tables from web saving as excel file in Python
Upvotes: 1
Views: 3516
Reputation: 2643
Use concat
,
import requests
import json
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
'Referer': 'http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp'
}
dfs = pd.DataFrame()
for page in range(0, 5):
data = {'limit': 100, 'offset': page * 100, 'pageNumber': page + 1}
json_arr = requests.post(
'http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json',
headers=headers,
data=data).text
d = json.loads(json_arr)
df = pd.read_json(json.dumps(d['rows']) , orient='list')
dfs = pd.concat([df, dfs], sort=False)
Or,
import requests
import json
import pandas as pd
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
'Referer': 'http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp'
}
dfs = []
for page in range(0, 5):
data = {'limit': 100, 'offset': page * 100, 'pageNumber': page + 1}
json_arr = requests.post(
'http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json',
headers=headers,
data=data).text
d = json.loads(json_arr)
dfs.append(pd.read_json(json.dumps(d['rows']) , orient='list'))
df = pd.concat(dfs, sort=False)
PS: The second block is much preferred as you should never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying. Thanks @parfait!
Upvotes: 2