abhishek pandey
abhishek pandey

Reputation: 93

Reading data from CSV File and Inserting it into Database

I have a requirement to read the data from a csv file and populate it into the database. I am using the bcp commandline utility for this purpose. My CSV File look like this:

First_name,Last_name,EmpID,company,languages
"Jack","Thomas","57616","IBM","C
C++
JAVA
COBOL
PERL
SQL
 "
"Tim","Cook","10001","Apple","Python
C++
Java
XML
 "

As you see, the last column(languages) has values each on a new line. The bcp command has been scripted to check for the row delimiter and once it gets the first value from the last column, it terminates Request you to please suggest how to parse this using bcp?

Upvotes: 1

Views: 884

Answers (2)

level3looper
level3looper

Reputation: 1051

Here is a SQL solution: This walks through your import file and parses the data into two tables. There are two loops. One loop for the 'master' table and one loop for the 'detail' table.

Setup

IF EXISTS(SELECT *
          FROM   #tempTable)
  DROP TABLE #tempTable

/*
Create Table emps
(
First_Name Varchar(25),
Last_Name VarChar(25),
EmpID VarChar(10),
Company VarChar(30)
)

Create Table langs
(
EmpID VarChar(10),
Lang VarChar(15)
)
*/

Delete From langs
Delete From emps

CREATE TABLE #tempTable
(
  RowVal VarChar(Max)
)

Query

BULK INSERT #tempTable
FROM 'c:\Downloads\EmpLangs.txt' 
WITH 
(
    FIRSTROW = 2,
    ROWTERMINATOR = '\n'
)

Declare @RowV VarChar(100)
--Use the following to get the location of each delimiter
Declare @f1q1 Int
Declare @f1q2 Int
Declare @f2q1 Int
Declare @f2q2 Int
Declare @f3q1 Int
Declare @f3q2 Int
Declare @f4q1 Int
Declare @f4q2 Int
Declare @f5q1 Int

Declare @empid VarChar(10)

Declare @vHeader Int = 1  --Is header row?

Declare vCursor CURSOR For Select RowVal  From #tempTable

  Open vCursor;
  Fetch Next From vCursor Into @RowV

  While @@FETCH_STATUS = 0  --Walk through rows to parse
  Begin

   If @vHeader = 1
      Begin     
        Set @f1q1 = CHARINDEX('"',@RowV,1)
        Set @f1q2 = CHARINDEX('"',@RowV,@f1q1+1)

        Set @f2q1 = CHARINDEX('"',@RowV,@f1q2+1)
        Set @f2q2 = CHARINDEX('"',@RowV,@f2q1+1)

        Set @f3q1 = CHARINDEX('"',@RowV,@f2q2+1)
        Set @f3q2 = CHARINDEX('"',@RowV,@f3q1+1)

        Set @f4q1 = CHARINDEX('"',@RowV,@f3q2+1)
        Set @f4q2 = CHARINDEX('"',@RowV,@f4q1+1)

        Set @f5q1 = CHARINDEX('"',@RowV,@f4q2+1)

        Insert Into emps Values
        (SUBSTRING(@RowV,@f1q1+1,@f1q2-@f1q1-1),
         SUBSTRING(@RowV,@f2q1+1,@f2q2-@f2q1-1),
         SUBSTRING(@RowV,@f3q1+1,@f3q2-@f3q1-1),
         SUBSTRING(@RowV,@f4q1+1,@f4q2-@f4q1-1) 
        )

        Set @vHeader = 0
        Set @empid = SUBSTRING(@RowV,@f3q1+1,@f3q2-@f3q1-1)
        Insert Into langs Values (@empid,SUBSTRING(@RowV,@f5q1+1,Len(@RowV)- @f5q1 + 1))  -- ADDED to get the trailing language from the header row
      End

     Fetch Next From vCursor Into @RowV
       While @@FETCH_STATUS = 0  And @vHeader = 0 And @RowV <> ' "'
         Begin
            Insert Into langs Values (@empid,@RowV)
            Fetch Next From vCursor Into @RowV
            If @RowV = ' "' 
             Begin
                If @@FETCH_STATUS = 0 
                  Begin
                     Fetch Next From vCursor Into @RowV
                     Set @vHeader = 1
                  End
             End
         End
  End;

  Close vCursor
  Deallocate vCursor

Select e.*,l.lang From emps e
INNER JOIN
langs l ON e.EmpID = l.EmpID

Result

First_Name  Last_Name   EmpID   Company Lang
Jack        Thomas      57616   IBM     C
Jack        Thomas      57616   IBM     C++
Jack        Thomas      57616   IBM     JAVA
Jack        Thomas      57616   IBM     COBOL
Jack        Thomas      57616   IBM     PERL
Jack        Thomas      57616   IBM     SQL
Tim         Cook        10001   Apple   Python
Tim         Cook        10001   Apple   C++
Tim         Cook        10001   Apple   Java
Tim         Cook        10001   Apple   XML

Upvotes: 1

user6811411
user6811411

Reputation:

I don't see much progress in trying to find a solution/doing research on your own - what is expected in [SO].

Here a possible PowerShell solution importing the csv,
converting the multiline column to a semicolon separated one and exporting as csv.

Import-Csv .\old.csv| ForEach-Object {
    $_.Languages=$_.Languages -split "`r?`n" -ne ' ' -join ';'
    $_
} | Export-Csv .\New.csv -NoTypeInformation

This will result in all columns double quoted:

> Get-Content .\new.csv
"First_name","Last_name","EmpID","company","languages"
"Jack","Thomas","57616","IBM","C;C++;JAVA;COBOL;PERL;SQL"
"Tim","Cook","10001","Apple","Python;C++;Java;XML"

Another PowerShell one liner will remedy this:

(Get-Content .\new.csv).trim('"') -replace '","',',' | Set-Content .\new.csv

First_name,Last_name,EmpID,company,languages
Jack,Thomas,57616,IBM,C;C++;JAVA;COBOL;PERL;SQL
Tim,Cook,10001,Apple,Python;C++;Java;XML

EDIT: one combined .ps1 file

## Q:\Test\2018\12\14\SO_53777634.ps1

$FileIn = '.\old.csv'
$FileOut= '.\new.csv'

Import-Csv $FileIn | ForEach-Object {
    $_.Languages=$_.Languages -split "`r?`n" -ne ' ' -join ';'
    $_
} | Export-Csv $FileOut -NoTypeInformation

(Get-Content $FileOut).trim('"') -replace '","',',' | Set-Content $FileOut

Upvotes: 3

Related Questions