Is there a python function for joining a json file to a dataframe?

Question

I have a pandas dataframe containing some business data, and a json that contains (among other things) a way to map one identifier to another (more granular to less granular). Is there any python or pandas function to accomplish this without converting the json to a dataframe first?

e.g.

input data frame

Bob, 123, Jakes Lane

mapping json

json = {'street number':123, 'street name':'Jakes Lane', postalcode='A1B2C3'}

What I want as the output is

Bob, 123, Jakes Lane, A1B2C3

frankr6591 · Accepted Answer

Per your comment, fixed code to do join using 2 methods.

First, we need to define problem data. There are 2 datasets

dataframe with name, street number/address (no postal code)
json list of addresses: street number/address, and postal code

goal is to join 2 datasets to generate list of name, st number, st name, posttalcode - generate CSV (comma separated line per person)

Below are 2 methods to do join. IMHO, doing it with temp dataframe would be much better. Unsure of size of each dataset, assume they are LARGE. Thus METHOD2 seems to be better approach as it does 1 pass thru json and 1 pass thru dataframe.

Here is the data with a json.load():

    import json
    df = pd.DataFrame([{"name":"Bob", "street number":123, "street name":"Jakes Lane"}])
    # Define JSON content (of file content)
    jstr = '{"addrList" : [{"street number":123, "street name":"Jakes Lane", "postalcode" : "A1B2C3"}]}'

    # Import json into dict & convert to string
    jDict = json.loads(jstr)

METHOD 1 : for every address in json, do apply add postal code that match

df['postalcode'] = np.nan

sn = 'street number'
snm = 'street name'
cols = ['name','street number','street']
infoList = []
for addr in jDict['addrList']:
    pc = addr['postalcode']
    infoList = list(df.apply(lambda r: '%s, %s, %s, %s'%(r['name'],r[sn],r[snm],pc) if (r[sn] == addr[sn]) & (r[snm] == addr[snm]) else r.postalcode, axis=1))
infoList

METHOD 2 : Transform json to dict index by st number/name; search dataframe for match

d = {str(a[sn])+a[snm]:a['postalcode'] for a in jDict['addrList']}
def matchAddr(r):
    a = str(r[sn])+r[snm]
    try: pc = d[a]
    except: pc = np.nan
    return '%s, %s, %s, %s'%(r['name'],r[sn],r[snm],pc)
list(df.apply(lambda r : matchAddr(r), axis=1))

Is there a python function for joining a json file to a dataframe?

Answers (2)

Related Questions