user2061929
user2061929

Reputation: 411

Import Unformatted txt file into SQL

I am having an issue importing data into SQL from a text file. Not because I don't know how...but because the formatting is pretty much terrible for this purpose. Below is an altered sample of the types of text files I need to work with:


1   VA -    P
2   VB to 1X    P
3   VC to 1Y    P
4   N - P
5   G to 1G,Frame   P
6   Fout to 1G,Frame    P
7   Open Breaker    P
8   1B to 1X    P
9   1C to 1Y    P

Test Status: Pass
Hi-Pot @ 1500V: Pass 
Customer Order:904177-F
Number: G4901626-200
Serial Number: J245F6-2D03856
Catalog #: CBDC37-X5LE30-H40-L630C-4GJ-G31
Operator: TGY
Date:  Aug 01, 2013
Start Time: 04:09:26
Finish Time: 04:09:33

The first 9 lines are all specific test results (tab separated), with header information below. My issue is that I need to figure out:

  1. How can I take the data above and turn it into something broken down into a standard column format to import into SQL?

  2. How can I then automate this such that I can loop through an entire folder structure? -What you see above is one of hundreds of files divided into several sub-directories.

Also note that the # of test lines above the header information vary from file to file. The header information remains in much the same format though. This is all legacy data that cannot be regenerated, but needs to be imported into our SQL databases.

I am thinking of using an SSIS project with a custom script to import the data...splicing the top section from the bottom by looking for the first empty row...then pivot the data in the header into column format...merge...then move on. But I don't write much VB and I'm not sure how to approach that.

I am working in a SQL Server 2008R2 environment with access to BIDS.

Thoughts?

Upvotes: 2

Views: 2418

Answers (4)

Cor Blimey
Cor Blimey

Reputation: 83

This solution assumes your input lines are terminated {CR}{LF}, the normal Windows way.
Tell MSSQL's Import/Export Wizard to import a Flat File; the Format is "Delimited"; the "Text Qualifier" is the {CR}; the "Header Row Delimiter" is the {LF}; and the OutputColumnWidth (in "Advanced") is a bit more than the longest possible line length.
It's simple and it works.

I just used this to import 23 million lines of mixed up data, and it took less than ten minutes. Now to edit it...

Upvotes: 0

Derrick Bell
Derrick Bell

Reputation: 2715

I would use a different approach and use SSDT/SSIS package to import the data.

  1. Add a script component to read in the text file and convert it to XML. Not hard there many examples on the web. In your script Store the XML you develop into a variable.
  2. Add a data flow
  3. Add an XML Source. In the XML source you can select the XML variable you created and process either group of data present in your file. Here is some information on using the XML Source.
  4. Add destination task to import it to a destination of your choice

Upvotes: 1

ninesided
ninesided

Reputation: 23263

I would first concentrate on getting the data into the database in the least complicated (and least error prone) way possible. Create a table with three columns: filename, line_number and line_data. Plop all of your files into that table and then you can start to think about how to interpret the data. I would probably be looking to use PIVOT, but if different files can have different numbers of fields it may introduce complications.

Upvotes: 2

thursdaysgeek
thursdaysgeek

Reputation: 7936

I would start by importing the data as all character into a table with a single field, one record per line. Then, from that table, you can parse each record into the fields and field types appropriate for each line. Hopefully there is a way to figure out what kind of data each line is, whether each file is consistant in order, or the header record indicates information about subsequent lines. From that, the data can be moved to a final (parsing may take more than one pass) table with the data stored in a format that is useable for whatever you need it.

Upvotes: 2

Related Questions