Reputation: 357
I have some financial data for over 6600 stocks stored in a Foxpro database. I could download the database views into a set of 15 files, which I did first into .dbf files and then into .txt files (comma-delimited).
For the .dbf set of files I used a spatialite virtualization extension with Python and Sqlite to convert them into Sqlite tables then merged them into an 8-table database (Let's call it DBF-derived). So with c for cursor:
c.execute("CREATE VIRTUAL TABLE temp_virt USING VirtualDbf({}, UTF-8)".format(file))
c.execute("CREATE TABLE {} AS SELECT * FROM temp_virt;".format(table_name))
For the .txt files, I used Pandas to convert and combine 12 of the 15 files into 5 CSV files, then I plied them with other remaining 3 .txt files in Python and Sqlite to create an 8-table database (let's call it CSV-derived) using a modified version of this code (from this page):
with open(csvfile, "rb") as f:
reader = csv.reader(f)
header = True
for row in reader:
if header:
# gather column names from the first row of the csv
header = False
sql = "DROP TABLE IF EXISTS %s" % tablename
c.execute(sql)
sql = "CREATE TABLE %s (%s)" % (tablename,
", ".join([ "%s text" % column for column in row ]))
c.execute(sql)
for column in row:
if column.lower().endswith("_id"):
index = "%s__%s" % ( tablename, column )
sql = "CREATE INDEX %s on %s (%s)" % ( index, tablename, column )
c.execute(sql)
insertsql = "INSERT INTO %s VALUES (%s)" % (tablename,
", ".join([ "?" for column in row ]))
Any explanation as to why the difference in size? Is it because of the 3 .txt files that I did not convert to CSV?
Upvotes: 0
Views: 504
Reputation: 23837
It is hard to understand what you are doing and particularly why you would ever want to use a CSV in between when you could directly get data from another database system. Anyway, it is your choice, the difference is probably for the fact that VFP DBF data with character fields have trailing spaces. Say a 30 chars field, having a single letter in it still has a length of 30. Your conversion to SQLite might not be trimming the trailing spaces, while in a CSV file those data are already saved as trimmed. Probably the easiest and most reliable way would be to directly create the SQLite tables and fill them with data from within a VFP program (using VFP is not a must of course, could be done in any language).
Upvotes: 1