B-road95
B-road95

Reputation: 109

JSON formatted string to pandas dataframe

OK, I have been beating my head against the wall with this one all afternoon. I know that there are many similar posts, but I keep getting errors and am probably making a stupid mistake.

I am using the apyori package found here to do some transaction basket analysis: https://pypi.python.org/pypi/apyori/1.1.1

It appears that the packages dump_as_json() method spits out dictionaries of RelationRecords for each possible basket.

I want to take these json formatted dictionaries into one pandas dataframe, but have had fits with different errors when attempting to use pd.read_json().

Here is my code:

import apyori, shutil, os
from apyori import apriori
from apyori import dump_as_json
import pandas as pd
import json

try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO

transactions = [
    ['Jersey','Magnet'],
    ['T-Shirt','Cap'],
    ['Magnet','T-Shirt'],
    ['Jersey', 'Pin'],
    ['T-Shirt','Cap']
]
results = list(apriori(transactions))
results_df = pd.DataFrame()
for RelationRecord in results:
    dump_as_json(RelationRecord,output_file)
print output_file.getvalue()
json_file = json.dumps(output_file.getvalue())
print json_file


print data_df.head()

Any ideas how to get the json formatted dictionaries stored in output_file into a pandas dataframe?

Upvotes: 1

Views: 1494

Answers (2)

Nitesh Sharma
Nitesh Sharma

Reputation: 61

You can further convert the Apriori result to a better looking dataframe using following script:

summary_df = pd.DataFrame(columns=('Items','Antecedent','Consequent','Support','Confidence','Lift'))

Support =[]
Confidence = []
Lift = []
Items = []
Antecedent = []
Consequent=[]

for RelationRecord in results: 
    for ordered_stat in RelationRecord.ordered_statistics:
        Support.append(RelationRecord.support)
        Items.append(RelationRecord.items)
        Antecedent.append(ordered_stat.items_base)
        Consequent.append(ordered_stat.items_add)
        Confidence.append(ordered_stat.confidence)
        Lift.append(ordered_stat.lift)

summary_df['Items'] = Items                                   
summary_df['Antecedent'] = Antecedent
summary_df['Consequent'] = Consequent
summary_df['Support'] = Support
summary_df['Confidence'] = Confidence
summary_df['Lift']= Lift

Final dataframe looks like:

Hope this helps :)

Upvotes: 0

Thtu
Thtu

Reputation: 2032

I would suggest reading up on StackOverflow's guidelines on producing a Minimal, Complete, and Verifiable example. Also, statements like "I keep getting errors" are not very helpful. That said, I took a look at your code and the source code for this apyori package. Typos aside, it looks like the problem line is here :

for RelationRecord in results:
    dump_as_json(RelationRecord,output_file)

You're creating a one-object-per-line JSON file (I think this is sometimes referred to as LSON or Line-JSON.) As a whole document, it just isn't valid JSON. You could try to keep this as a list of homogeneous dictionaries or some other pd.DataFrame friendly structure.

output = []
for RelationRecord in results:
    o = StringIO()
    dump_as_json(RelationRecord, o)
    output.append(json.loads(o.getvalue()))
data_df = pd.DataFrame(output)

Upvotes: 2

Related Questions