Reputation: 2281
I have a file that can contain from 3 to 4 columns of numerical values which are separated by comma. Empty fields are defined with the exception when they are at the end of the row:
1,2,3,4,5
1,2,3,,5
1,2,3
The following table was created in MySQL:
+-------+--------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+--------+------+-----+---------+-------+ | one | int(1) | YES | | NULL | | | two | int(1) | YES | | NULL | | | three | int(1) | YES | | NULL | | | four | int(1) | YES | | NULL | | | five | int(1) | YES | | NULL | | +-------+--------+------+-----+---------+-------+
I am trying to load the data using MySQL LOAD command:
LOAD DATA INFILE '/tmp/testdata.txt' INTO TABLE moo FIELDS
TERMINATED BY "," LINES TERMINATED BY "\n";
The resulting table:
+------+------+-------+------+------+ | one | two | three | four | five | +------+------+-------+------+------+ | 1 | 2 | 3 | 4 | 5 | | 1 | 2 | 3 | 0 | 5 | | 1 | 2 | 3 | NULL | NULL | +------+------+-------+------+------+
The problem lies with the fact that when a field is empty in the raw data and is not defined, MySQL for some reason does not use the columns default value (which is NULL) and uses zero. NULL is used correctly when the field is missing alltogether.
Unfortunately, I have to be able to distinguish between NULL and 0 at this stage so any help would be appreciated.
Thanks S.
edit
The output of SHOW WARNINGS:
+---------+------+--------------------------------------------------------+ | Level | Code | Message | +---------+------+--------------------------------------------------------+ | Warning | 1366 | Incorrect integer value: '' for column 'four' at row 2 | | Warning | 1261 | Row 3 doesn't contain data for all columns | | Warning | 1261 | Row 3 doesn't contain data for all columns | +---------+------+--------------------------------------------------------+
Upvotes: 196
Views: 214203
Reputation: 4092
The chosen solution is a way - however there is a difference between null and empty space. For numbers it works. The earlier solutions allow you to keep the null when loading in.
It really is a shame that sed hasn't implemented PCRE. That would give it look ahead, and a slew of other capabilities. But the perl solution is:
perl -pi -e 's/,(?=,|$)/,\\\\N/g' /my/file
Note: the i means in place editing. Haven't tested it, but it creates a neat solution if it works.
It works because the lookahead doesn't consume a character, and so the forward match is ready for the next match.
Please test, and of course, your shell escapes may need tweaking.
Upvotes: 1
Reputation: 19
You can firstly read the file in pandas as pandas dataframe and then wherever you want the values to be NULL , there you can replace the empty values with string 'NULL' using replace function (dataframe_name.replace(value_to_be_replaced,'NULL')
and save the new dataframe in .csv foramt using to_csv function.
After this when you will import the csv file into MySQL using :
mysql --local-infile=1 -u root -p
SET GLOBAL local_infile=1;
use
load data local infile '<path_to_file>' into table <table_name> columns terminated by "," optionally enclosed by "'" ignore 1 lines.
Then all the NULL values in the dataset will be recognised as NULL only.
I hope it helps.
Upvotes: 1
Reputation: 43
MySQL converts empty fields into empty string ''
, hence why the error when inserting numerical fields, since the conversion from string to INT is not a thing. Even when the INT field in the create table is DEFAULT NULL.
The straightforward solution would be to preprocess the csv and insert \N
(not \n
) as NULL fields.
This can be done quickly with:
sed -i 's/,,/,\\N,/g file.csv'
sed -i 's/,,/,\\N,/g file.csv'
It is important to do it twice because consecutive blank fields will be skipped, since the second separator of a blank field is also the first separator of the next field, and it will be skipped after the first substitution.
In other words, if you use only one command, something,,,,SomethingElse
will be converted to something,\N,,\N,SomethingElse
.
Maybe there is a smarter way to do it with a more advanced command but this works just fine. You can loop through all csvs in a dir and run the command twice for each file. (reference)
Upvotes: -1
Reputation: 31
Converted the input file to include \N
for the blank column data using the below sed command in UNix terminal:
sed -i 's/,,/,\\N,/g' $file_name
and then use LOAD DATA INFILE
command to load to mysql
Upvotes: 2
Reputation: 12751
This will do what you want. It reads the fourth field into a local variable, and then sets the actual field value to NULL, if the local variable ends up containing an empty string:
LOAD DATA INFILE '/tmp/testdata.txt'
INTO TABLE moo
FIELDS TERMINATED BY ","
LINES TERMINATED BY "\n"
(one, two, three, @vfour, five)
SET four = NULLIF(@vfour,'')
;
If they're all possibly empty, then you'd read them all into variables and have multiple SET statements, like this:
LOAD DATA INFILE '/tmp/testdata.txt'
INTO TABLE moo
FIELDS TERMINATED BY ","
LINES TERMINATED BY "\n"
(@vone, @vtwo, @vthree, @vfour, @vfive)
SET
one = NULLIF(@vone,''),
two = NULLIF(@vtwo,''),
three = NULLIF(@vthree,''),
four = NULLIF(@vfour,'')
;
Upvotes: 236
Reputation: 19
(variable1, @variable2, ..) SET variable2 = nullif(@variable2, '' or ' ') >> you can put any condition
Upvotes: 1
Reputation: 87
show variables
Show variables like "`secure_file_priv`";
Note: keep your csv file in location given by the above command.
create table assessments (course_code varchar(5),batch_code varchar(7),id_assessment int, assessment_type varchar(10), date int , weight int);
Note: here the 'date
' column has some blank values in the csv file.
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/assessments.csv'
INTO TABLE assessments
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY ''
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(course_code,batch_code,id_assessment,assessment_type,@date,weight)
SET date = IF(@date = '', NULL, @date);
Upvotes: 4
Reputation: 101
The behaviour is different depending upon the database configuration. In the strict mode this would throw an error else a warning. Following query may be used for identifying the database configuration.
mysql> show variables like 'sql_mode';
Upvotes: 8
Reputation: 3236
MySQL manual says:
When reading data with LOAD DATA INFILE, empty or missing columns are updated with ''. If you want a NULL value in a column, you should use \N in the data file. The literal word “NULL” may also be used under some circumstances.
So you need to replace the blanks with \N like this:
1,2,3,4,5
1,2,3,\N,5
1,2,3
Upvotes: 165
Reputation: 1446
Preprocess your input CSV to replace blank entries with \N.
Attempt at a regex: s/,,/,\n,/g and s/,$/,\N/g
Good luck.
Upvotes: 5