walterhpdx
walterhpdx

Reputation: 21

mysql import multiple lines as a single record

Posted this to Reddit yesterday, but no love. I'm on Centos, writing bash scripts and parsing data to import into mysql.

I'm having to convert a story archive that stored the main part of the stories in a plain text file, and need to be able to import these multiple-lined text files into a column in my database. I know I can use mysqlimport, and I have the file designated as a pipe delimited - BUT because the text file I'm importing has carriage returns/line breaks in them, it's importing each paragraph as its own row. So a 9 paragraph text file will import as 9 rows when I use mysqlimport.

Is there a way to do this?

I know the ideal text file for importing (with pipe delimiters) would be like (without the blank line between):

this is my record|12345

another record|24353

have another bagel, why don't you?|43253

However, my file is actually closer to this:

This is the first line of my first paragraph. And now I'm going to do some more line wrapping and stuff.

This is a second line from the same text file that should be treated as a single record along with the first line in a single "blob" or text field. |12345

This is the last stumbling block to recover from a bad piece of software someone dropped in my lap, and I hope this can be done. I have 14,000 of these text files (each in this format), so doing them by hand is kind of out of the question.

Upvotes: 1

Views: 529

Answers (2)

Walter A
Walter A

Reputation: 20032

I do not know about performance when you converting the lines to sql statements. I think it can be useful:

Input

This is the first line of my first paragraph. And now I'm going to do some more line wrapping and stuff.

This is a second line from the same text file that should be treated as a single record along with the first line in a single "blob" or text field. |12345                                                                    
I am hoping I understood the question correct.                                                                 
Everything without a pipe is part of the first field.                                                          
And the line with a pipe is for field 1 and 2.                                                                 
Like this one |12346

Script

my_insert="INSERT INTO my_table                                                                                
     (field1, field2)                                                                                          
     VALUES                                                                                                    
     ('"                                                                                                       

   firstline=0    
   while read -r line; do
      if [[ -z "${line}" ]]; then
         printf "\n"             
         continue;               
      fi                         
      if [[ "${firstline}" -eq 0 ]]; then
         printf "%s" "${my_insert}"      
         firstline=1                     
      fi                                 
      line_no_pipe=${line%|*}            
      if [[ "${line}" = "${line_no_pipe}" ]]; then
         printf "%s\n" "${line}"                  
      else                                        
         printf "%s',%s);\n" "${line_no_pipe}" "${line##*|}"
         firstline=0                                        
      fi                                                    
   done < input                                             

Output

INSERT INTO my_table
     (field1, field2)
     VALUES
     ('This is the first line of my first paragraph. And now I'm going to do some more line wrapping and stuff.

This is a second line from the same text file that should be treated as a single record along with the first line in a single "blob" or text field. ',12345);
INSERT INTO my_table
     (field1, field2)
     VALUES
     ('I am hoping I understood the question correct.
Everything without a pipe is part of the first field.
And the line with a pipe is for field 1 and 2.
Like this one ',12346);

Upvotes: 1

Sathik Khan
Sathik Khan

Reputation: 439

Encode / transmit new line as '\n' and same way tab as '\t'. And this is the best practice when you are storing any url or raw text into your database. This will also help you to avoid the sql injection and solve your current problem too...

Please let me know if this helps. Thanks.

Upvotes: 1

Related Questions