DDV
DDV

Reputation: 2384

Translate API JSON to Dataframe

The API I am connecting to has the following JSON reponse (as per their docs):

{
  "total_results": 0,
  "page_size": 0,
  "page_number": 0,
  "offers": [
    {
      "tsin_id": 0,
      "offer_id": 0,
      "sku": "string",
      "barcode": "string",
      "product_label_number": "string",
      "selling_price": 0,
      "rrp": 0,
      "leadtime_days": 0,
      "leadtime_stock": [
        {
          "merchant_warehouse": {
            "warehouse_id": 0,
            "name": "string"
          },
          "quantity_available": 0
        }
      ],
      "status": "string",
      "title": "string",
      "offer_url": "string",
      "stock_cover": [
        {
          "warehouse_id": 0,
          "stock_cover_days": 0
        }
      ],
      "sales_units": [
        {
          "warehouse_id": 0,
          "sales_units": 0
        }
      ],
      "discount": "string",
      "discount_shown": true,
      "storage_fee_eligible": true,
      "date_created": "string",
      "stock_at_takealot_total": 0,
      "stock_at_takealot": [
        {
          "warehouse": {
            "warehouse_id": 0,
            "name": "string"
          },
          "quantity_available": 0
        }
      ],
      "stock_on_way": [
        {
          "warehouse": {
            "warehouse_id": 0,
            "name": "string"
          },
          "quantity_available": 0
        }
      ]
    }
  ]
}

My current code is below:

import pandas as pd
from pandas import json_normalize
import requests as rq
import json
from datetime import datetime

# API information
url = "https://seller-api.takealot.com/v2"
endpoint = "/offers?"
api_key = "Key xyz"

header = {
        'Authorization': api_key
}

full_url = url + endpoint
    
response = rq.get(full_url, headers=header)
    
# convert to dataframe
data = response.text
info = json.loads(data)

df = json_normalize(info["offers"])

# end time
print(datetime.now().strftime('%H:%M:%S'))

This returns mostly correct, but columns like leadtime_stock don't return correctly. For example, if I print out tsin_id I get:

print(df['tsin_id'].head())

0    74215970
1    74759079
2    44290949
3    48140749
4    74824209
Name: tsin_id, dtype: int64

But for leadtime_stock I get:

print(df['leadtime_stock'].head())

0    [{'merchant_warehouse': {'warehouse_id': 13167...
1    [{'merchant_warehouse': {'warehouse_id': 13167...
2    [{'merchant_warehouse': {'warehouse_id': 13167...
3    [{'merchant_warehouse': {'warehouse_id': 13167...
4    [{'merchant_warehouse': {'warehouse_id': 13167...
Name: leadtime_stock, dtype: object

Is there an easy way to convert the columns like leadtime_stock, stock_at_takealot, stock_on_way into the same df at the same time, or after the first normalize?

EDIT

Required dataframe (as an example) I am after will be in the format:

"tsin_id" | "offer_id" | "sku" | "barcode" | "product_label_number" | "selling_price" | "rrp" | "leadtime_days" | "leadtime_stock" | "merchant_warehouse" | "warehouse_id" | "name" | "quantity available"

Upvotes: 1

Views: 105

Answers (1)

sammywemmy
sammywemmy

Reputation: 28649

Using json_normalize, pass leadtime_stock as record_path, and the keys surrounding leadtime_stock to meta (these parameters are noted in the docs) :

pd.json_normalize(data['offers'], 'leadtime_stock', ['tsin_id','offer_id',
                                                     'sku', 'barcode', 
                                                    'product_label_number', 
                                                    'selling_price', 'rrp', 
                                                    'leadtime_days'])

Upvotes: 2

Related Questions