Reputation: 101
Greetings Support Community,
I have about 10 million+ files that I am trying to load into MySQL database using the following script:
WORKING_DIR=/tmp
FILE1="*test*"
timestamp_format="%Y-%m-%d %H:%i:%s.%x"
for i in ${WORKING_DIR}/${FILE1}
do
if [ -f "$i" ]; then
mysql -uroot -ptest my_database --local-infile=1<<-SQL
SET sql_log_bin=0;
LOAD DATA LOCAL INFILE '${i}' INTO TABLE my_table
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(id, transaction_id, app_id, sub_id);
SQL
fi
done
Its an extremely slow process. After about 24 hours, I've only been able to load about 2 million records. In each file, there is one record. At this rate, this will complete in about 5 days. Is there a faster way of doing this? E.g. Should I concatenate the files before processing?
Any suggestion to improve loading this data into MySQL would be greatly appreciated.
Thanks!
Upvotes: 0
Views: 150
Reputation: 189347
You ask (in a comment) how to concatenate your files. That would be
cat /tmp/*test1*
though apparently you actually want to omit the first line from each:
awk 'FNR>1' /tmp/*test1*
How to make your SQL version read from standard input is beyond my competence. If you can't, maybe save the output to a temporary file, and process that.
If you get "argument list too long" maybe try
find /tmp -maxdepth 1 -type f -name '*test1*' -exec awk 'FNR>1' {} +
The -maxdepth 1
says not to descend into subdirectories; take it out if that's not what you want.
The -exec
with a plus might not be available on really old systems; try with \;
in its place if you get a syntax error (though there can be a rather unpleasant performance penalty).
I don't see that the variables made anything clearer, easier, more readable, or more mainatainable, so I simply took them out.
Upvotes: 1