Reputation: 804
I am trying to determine the type of data contained in each column of a .csv
file so that I can make CREATE TABLE
statements for MySQL. The program makes a list of all the column headers and then grabs the first row of data and determines each data type and appends it to the column header for proper syntax. For example:
ID Number Decimal Word
0 17 4.8 Joe
That would produce something like CREATE TABLE table_name (ID int, Number int, Decimal float, Word varchar());
.
The problem is that in some of the .csv
files the first row contains a NULL
value that is read as an empty string and messes up this process. My goal is to then search each row until one is found that contains no NULL
values and use that one when forming the statement. This is what I have done so far, except it sometimes still returns rows that contains empty strings:
def notNull(p): # where p is a .csv file that has been read in another function
tempCol = next(p)
tempRow = next(p)
col = tempCol[:-1]
row = tempRow[:-1]
if any('' in row for row in p):
tempRow = next(p)
row = tempRow[:-1]
else:
rowNN = row
return rowNN
Note: The .csv
file reading is done in a different function, whereas this function simply uses the already read .csv
file as input p
. Also each row is ended with a ,
that is treated as an extra empty string so I slice the last value off of each row before checking it for empty strings.
Question: What is wrong with the function that I created that causes it to not always return a row without empty strings? I feel that it is because the loop is not repeating itself as necessary but I am not quite sure how to fix this issue.
Upvotes: 0
Views: 114
Reputation: 2456
I cannot really decipher your code. This is what I would do to only get rows without the empty string.
import csv
def g(name):
with open('file.csv', 'r') as f:
r = csv.reader(f)
# Skip headers
row = next(r)
for row in r:
if '' not in row:
yield row
for row in g('file.csv'):
print('row without empty values: {}'.format(row))
Upvotes: 2