ah bon
ah bon

Reputation: 10011

Iterate append json and save as dataframe in Python

I want to iterate and extract tables from the link here, then concatenate or append them to save as a dataframe.

I have used a loop iterate tables but I'm not sure how can I append all json or dataframe into one?

Anyone could help? Thank you.

from requests import post
import json
import pandas as pd
import numpy as np

headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
        "Referer": "http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp"}
dfs = []
#dfs = pd.DataFrame()

for page in range(0, 5):
    data = {"limit": 100, "offset": page * 100, "pageNumber": page + 1}
    json_arr = requests.post("http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json", headers = headers, data = data).text
    d = json.loads(json_arr)
    df = pd.read_json(json.dumps(d['rows']) , orient='list')

Reference related: Iterate and extract tables from web saving as excel file in Python

Upvotes: 1

Views: 3516

Answers (1)

E. Zeytinci
E. Zeytinci

Reputation: 2643

Use concat,

import requests
import json
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
    'Referer': 'http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp'
}

dfs = pd.DataFrame()

for page in range(0, 5):
    data = {'limit': 100, 'offset': page * 100, 'pageNumber': page + 1}
    json_arr = requests.post(
        'http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json', 
        headers=headers, 
        data=data).text
    d = json.loads(json_arr)
    df = pd.read_json(json.dumps(d['rows']) , orient='list')
    dfs = pd.concat([df, dfs], sort=False)

Or,

import requests
import json
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
    'Referer': 'http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp'
}

dfs = []

for page in range(0, 5):
    data = {'limit': 100, 'offset': page * 100, 'pageNumber': page + 1}
    json_arr = requests.post(
        'http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json', 
        headers=headers, 
        data=data).text
    d = json.loads(json_arr)
    dfs.append(pd.read_json(json.dumps(d['rows']) , orient='list'))

df = pd.concat(dfs, sort=False)

PS: The second block is much preferred as you should never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying. Thanks @parfait!

Upvotes: 2

Related Questions