Reputation: 3383
I have CSV files that I want to make database tables from in mysql. I've searched all over and can't find anything on how to use the header as the column names for the table. I suppose this must be possible. In other words, when creating a new table in MySQL do you really have to define all the columns, their names, their types etc in advance. It would be great if MySQL could do something like Office Access where it converts to the corresponding type depending on how the value looks.
I know this is maybe a too broadly defined question, but any pointers in this matter would be helpful. I am learning Python too, so if it can be done through a python script that would be great too.
Thank you very much.
Upvotes: 0
Views: 3784
Reputation: 148965
The csv
module can easily give you the column names from the first line, and then the values from the other ones. The hard part will be do guess the correct column types. When you load a csv file into an Excel worksheet, you only have few types : numeric, string, date.
In a database like MySQL, you can define the size of string columns, and you can give the table a primary key and eventually other indexes. You will not be able to guess that part automatically from a csv file.
At the simplest way, you can treat all columns as varchar(255)
. It is really uncommon to have fields in a csv file that do not fit in 255 characters. If you want something more clever, you will have to scan the file twice : first time to control the maximum size for each colum, and at the end, you could take the minimum power of 2 greater than that. Next step would be to control if any column contains only integers or floating point values. It begins to be harder to do that automatically, because the representation of floating point values may be different depending on the locale. For example 12.51
in an english locale would be 12,51
in a french locale. But Python can give you the locale.
The hardest thing would be eventual date or datetime fields, because there are many possible formats only numeric (dd/mm/yyyy
or mm/dd/yy
) or using plain text (Monday, 29th of september
).
My advice would be to define a default mode, for example all string, or just integer and strings, and use configuration parameters or even a configuration file to finely tune conversion per column.
For the reading part, the csv module
will give you all what you need.
Upvotes: 0
Reputation: 1224
Using Python, you could use the csv DictReader module to makes it pretty easy to use the headers from the csv files as labels for the input data. It basically reads all lines in as a dictionary object with the keys as the headers, so you can use the keys as the source for your column names when accessing mySQL.
A quick example that reads a csv into a list of dictionaries:
example.csv:
name,address,city,state,phone
jack,111 washington st, somewhere, NE, 888-867-5309
jill,112 washington st, somewhere else, NE, 888-867-5310
john,113 washington st, another place, NE, 888-867-5311
example.py:
import csv
data = []
with open("example.csv") as csvfile:
reader = csv.DictReader(csvfile)
for line in reader:
data.append(line)
print(data[0].keys())
print(data[0]['address'])
print(data[1]['name'])
print(data[2]['phone'])
output:
$:python example.py
dict_keys(['name', 'address', 'city', 'state', 'phone'])
111 washington st
jill
888-867-5311
More in-depth examples at: http://java.dzone.com/articles/python-101-reading-and-writing
Some info on connection to MySQL in Python: How do I connect to a MySQL Database in Python?
Upvotes: 1