Brad Davis
Brad Davis

Reputation: 1170

Is there a python function for joining a json file to a dataframe?

I have a pandas dataframe containing some business data, and a json that contains (among other things) a way to map one identifier to another (more granular to less granular). Is there any python or pandas function to accomplish this without converting the json to a dataframe first?

e.g.

input data frame

Bob, 123, Jakes Lane

mapping json

json = {'street number':123, 'street name':'Jakes Lane', postalcode='A1B2C3'}

What I want as the output is

Bob, 123, Jakes Lane, A1B2C3

Upvotes: 0

Views: 52

Answers (2)

swatchai
swatchai

Reputation: 18782

This demo code shows how to iterate all rows of a dataframe and collect values (zipcode) from a JSON. In the final step, the values are used to create a new column of the dataframe

import pandas as pd
from io import StringIO

data2 = """ID, Name, Number, Street
1, Bob, 123, Jakes Lane
2, Cat, 415, Carrot Alley
3, Derek, 741, Dirt Lane"""

# Proposed JSON object data
better_json = {
    123: {'street number': 123, 'street name': 'Jakes Lane', 'postalcode': 'A1B2C3'},
    741: {'street number': 741, 'street name': 'Dirt Lane', 'postalcode': 'A2Z4C6'},
    531: {'street number': 531, 'street name': 'Fair Lane', 'postalcode': 'A0Z1C9'}
}

df2 = pd.read_csv(StringIO( data2 ), sep=',\s+', index_col='ID', engine='python')
print(df2)

# looping and get zipcode
link_values = []
for i, item in df2.Number.iteritems():
    try:
        #print(i, item, better_json[item]['postalcode'])
        link_values.append(better_json[item]['postalcode'])
    except:
        #print(i, item, "No match!")
        link_values.append("")


df2["zipcode"] = link_values
print()
print(df2)

The output printed by the code:

     Name  Number        Street
ID                             
1     Bob     123    Jakes Lane
2     Cat     415  Carrot Alley
3   Derek     741     Dirt Lane

     Name  Number        Street zipcode
ID                                     
1     Bob     123    Jakes Lane  A1B2C3
2     Cat     415  Carrot Alley        
3   Derek     741     Dirt Lane  A2Z4C6

Upvotes: 1

frankr6591
frankr6591

Reputation: 1247

Per your comment, fixed code to do join using 2 methods.

First, we need to define problem data. There are 2 datasets

  1. dataframe with name, street number/address (no postal code)
  2. json list of addresses: street number/address, and postal code

goal is to join 2 datasets to generate list of name, st number, st name, posttalcode - generate CSV (comma separated line per person)

Below are 2 methods to do join. IMHO, doing it with temp dataframe would be much better. Unsure of size of each dataset, assume they are LARGE. Thus METHOD2 seems to be better approach as it does 1 pass thru json and 1 pass thru dataframe.

Here is the data with a json.load():

    import json
    df = pd.DataFrame([{"name":"Bob", "street number":123, "street name":"Jakes Lane"}])
    # Define JSON content (of file content)
    jstr = '{"addrList" : [{"street number":123, "street name":"Jakes Lane", "postalcode" : "A1B2C3"}]}'

    # Import json into dict & convert to string
    jDict = json.loads(jstr)

METHOD 1 : for every address in json, do apply add postal code that match

df['postalcode'] = np.nan

sn = 'street number'
snm = 'street name'
cols = ['name','street number','street']
infoList = []
for addr in jDict['addrList']:
    pc = addr['postalcode']
    infoList = list(df.apply(lambda r: '%s, %s, %s, %s'%(r['name'],r[sn],r[snm],pc) if (r[sn] == addr[sn]) & (r[snm] == addr[snm]) else r.postalcode, axis=1))
infoList

METHOD 2 : Transform json to dict index by st number/name; search dataframe for match

d = {str(a[sn])+a[snm]:a['postalcode'] for a in jDict['addrList']}
def matchAddr(r):
    a = str(r[sn])+r[snm]
    try: pc = d[a]
    except: pc = np.nan
    return '%s, %s, %s, %s'%(r['name'],r[sn],r[snm],pc)
list(df.apply(lambda r : matchAddr(r), axis=1))

Upvotes: 1

Related Questions