D3Young
D3Young

Reputation: 35

Skip LAST Line with OPENROWSET

I'm doing a fairly straight forward import of a .csv file that is provided by a vendor using the following command (I have abbreviated it a bit):

Insert Into ... From OPENROWSET (Bulk 'CSVFileName', FORMATFILE ='XMLFormatFileName', FIRSTROW = 2, MAXERRORS = 0 ) AS BulkLoadFile

The CSV file format looks like the following (you may need to click to see the image since I'm new to StackOverflow I cannot directly post images yet):

https://i.sstatic.net/qZMwV.jpg

My problem is the LAST ROW with the count... It causes the import to fail!

FYI >>> Yes I know you can use "SET NOCOUNT ON;" but I do not generate this file, so that is not an option.

Right now I open the file and delete the last row and then re-save before importing. (note: I also delete the top 2 rows shown in green since I'm already in the file, but the two first rows are not an issue as I can skip those rows using FIRSTROW = 4 switch).

So my questions are:

Is there any way to SKIP LAST ROW?

OR

Is there any way to get a row count and maybe use the LAST ROW switch? i.e. LAST ROW = Count(*) from myCSVfile

OR

Since it always starts like "Total:" is there anyway to add a WHERE clause? i.e. WHERE value in first column is not like 'Total:%'

OR

Is this something that SSIS would handle better? If so I can move this import routine over to SSIS.

Thanks in advance... I look forward to truly automating this and not having to open this file every time we import it (multiple times a day).

D3Y

Upvotes: 1

Views: 3700

Answers (4)

James Hamilton
James Hamilton

Reputation: 51

Hopefully this will help anyone looking to do this in future. Put the csv into a single column temp table and count that.

DECLARE @lastrow INT    

DROP TABLE IF EXISTS #temp
CREATE TABLE #temp ([value] VARCHAR(MAX))
    
BULK INSERT #temp
FROM 'D:\Import\test.csv'
WITH (rowterminator='\n')
    
SELECT @lastrow = COUNT(*) 
FROM #temp

Upvotes: 0

Dropbear
Dropbear

Reputation: 1

I just made an openrowset command that precedes the one I need to know the last row for and directed it at the file in question. Set the formatfile to only have delineator (one column), set the rowterminator (in this case \n). This was my very simple format file:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\n" MAX_LENGTH="10000" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="Column0" xsi:type="SQLVARYCHAR"/>
 </ROW>
</BCPFORMAT>

Then I used that count minus 1 going forward

DECLARE @File VARCHAR(200);

SET @File = '''Dir\path\';
SET @File += 'filename.txt''';

DECLARE @SQL4Rows NVARCHAR(max)

SET @SQL4Rows = (
        'select count (*) FROM OPENROWSET(BULK 
        ' + @file + 
        ', 
        FORMATFILE = 
        ''\\fullformatfilepath_rowCountOnly.xml'') T'
        )

DECLARE @SQL4RowsTable AS TABLE (col INT)

INSERT INTO @SQL4RowsTable
EXECUTE sp_executesql @SQL4Rows

DECLARE @Rows VARCHAR(6)

SET @Rows = (
        SELECT *
        FROM @SQL4RowsTable
        ) - 1 --the last line of the files is the row count, thus the minus 1

DECLARE @SQL4Results NVARCHAR(max) --main sql to get results

SET @SQL4Results = (
        '
SELECT column,etc,
INTO #inlinetemp
FROM OPENROWSET(BULK ' + @File + 
        ', 
        FORMATFILE = 
        ''\\fullformatfilepath.xml'',LASTROW = ' + @Rows + 
        ') T
WHERE column = ''la''
SELECT DISTINCT columns
FROM #inlinetemp ab
LEFT JOIN db..tblaccounts a
    ON ab.accountnumber = a.AccountNumber
ORDER BY column asc '
        )

EXECUTE sp_executesql @SQL4Results

Upvotes: 0

D3Young
D3Young

Reputation: 35

WORKAROUND found...

I was using MAXERRORS = 0

If I assume (yes I know the joke here about assuming) that I will have 1 (expected) error related to the last line I can use MAXERRORS = 1 and my data imports just fine. The last line (total:xxxx) is imported to my table but I can just ignore that row it when I use that further down in my update pipeline.

Also worth noting… If max_errors is not specified, the default is 10 , therefore I could have avoided this by not even using the MAXERRORS switch... Go Figure.

Background: MSDN (msdn.microsoft.com/en-us/library/ms188365.aspx) Specifies the maximum number of syntax errors allowed in the data before the bulk-import operation is canceled. Each row that cannot be imported by the bulk-import operation is ignored and counted as one error

Upvotes: 0

ivankristof
ivankristof

Reputation: 51

I see two options:

  1. insert data to temp table, clean unwanted rows, and then insert into final (production table). This could be done using T-SQL only.

  2. using SSIS package (specificaly data flow task), you could use Conditional split to filter notwanted rows away. e.g. https://www.simple-talk.com/sql/ssis/ssis-basics-using-the-conditional-split/

Upvotes: 0

Related Questions