Reputation: 4792
Right now I messing with some JSON data and I am trying to push it into the MySQL database on the fly. The JSON file is enormous so I have to carefully go through it line by line using yield function in Python, convert each JSON line into small pandas DF and write it into MySQL. The problem is that when I create DF from JSON it adds the index column. And it seems that when I write stuff to MySQL it ignores index=False option. Code below
import gzip
import pandas as pd
from sqlalchemy import create_engine
#stuff to parse json file
def parseJSON(path):
g = open(path, 'r')
for l in g:
yield eval(l)
#MySQL engine
engine = create_engine('mysql://login:password@localhost:1234/MyDB', echo=False)
#empty df just to have it
df = {}
for l in parseJSON("MyFile.json"):
df = pd.DataFrame.from_dict(l, orient='index')
df.to_sql(name='MyTable', con=engine, if_exists = 'append', index=False)
And I get a error:
OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column '0' in 'field list'")
Any ideas what I am missing? Or is there a way to get around this stuff?
UPD. I see that dataframe has an unnamed column with value 0 each time I create the dataframe in inner loop.
Here is some info about DF:
df
Out[155]:
0
reviewerID A1C2VKKDCP5H97
asin 0007327064
reviewerName Donna Polston
helpful [0, 0]
unixReviewTime 1392768000
reviewText love Oddie ,One of my favorite books are the O...
overall 5
reviewTime 02 19, 2014
summary Wow
print(df.columns)
RangeIndex(start=0, stop=1, step=1)
Upvotes: 3
Views: 1153
Reputation: 8493
You currently have a frame with one column named 0 with your intended column names as the index of your frame. Perhaps you can try
df = pd.DataFrame.from_dict(l)
NOTE: I think you would have much better performance if you could build up a dict (or some other structure), convert all rows to a df then push to mysql. This one row at a time might be too slow
Upvotes: 2