Reputation: 90
I have been searching for an efficient way of uploading 100's of text files in the same folder to an SQL Server with one click of a button. I perform this every month, manually mapping the tables. It takes some time to manually choose each flat file (.txt) and map it to the correct loading table before the exact extract.
I have searched many different ways, including SSIS. These are too manual and take too much time to perform and especially maintain.
I have had an idea that, for example below of 5 flat files in the same folder:
To be uploaded to the SQL server in a schema [upload] with their respective file names. The final output would look like this.
These tables could then be assigned location manually in a lookup table and inserted into the correct tables. This would remove almost all manual work aside from maintaining a location table in the Server.
Note it is essential that these tables are also created when inserted as the files have unique names when extracted. Also, to note the files are pipe (|
) delimited text files rather than CSV.
Does anyone know of a way this can be performed? It would have to avoid the use of Visual Studio manual packages etc. Rather a bulk insert of every flat file into an SQL Server with their respective file names.
An efficient query/process could save me days. Also a HUGE plus if this can be run server-side at schedule times to prevent uploads from an actual active pc i.e., SQL Server Agents, etc.
Upvotes: 1
Views: 3709
Reputation: 1047
Here is a stored procedure that uses a dynamic script to:
BULK INSERT
the file into table [My_bulk_insert_data]
assuming that the first line of the file contains the column names:
create proc usp_bulkinsert_file(@fullfilename as nvarchar(max))
as
begin
declare @filename as nvarchar(max)
set @filename = substring(@fullfilename,1 + LEN(@fullfilename) - CHARINDEX('\', REVERSE(@fullfilename)) + 1, len(@fullfilename))
declare @script as nvarchar(max)
if exists(select 1 from sysobjects where name = 'My_bulk_insert_data') drop table My_bulk_insert_data
set @script = 'if exists(select 1 from sysobjects where name =''' + @filename + ''') drop table [' + @filename + ']'
exec(@script)
create table My_bulk_insert_data(data nvarchar(max))
set @script='
BULK INSERT dbo.My_bulk_insert_data
FROM ''' + @fullfilename + '''
WITH
(
FIELDTERMINATOR = ''''
)
'
print @script
exec(@script)
declare @data as nvarchar(max)
select top 1 @data = data from My_bulk_insert_data
set @script = 'Create Table [' + @filename + ']('
select @script = @script + '[' + value + ']' + ' nvarchar(max) ,' from string_split(@data, '|') My_bulk_insert_data
set @script = SUBSTRING(@script, 1, len(@script) - 1) + ')'
print @script
exec(@script)
set @script='BULK INSERT [' + @filename + ']' + ' from ''' + @fullfilename + '''
with
(
FIELDTERMINATOR = ''|''
)
'
print @script
exec(@script)
end
Results Screenshot
Upvotes: 1
Reputation: 5594
Note: I added asynchronous code to be run in console application Core3.1 or later at the bottom.
I kind of took this as a challenge to see if I could do it. Here is what I came up with in c#...
Basic concept:
Loop through files
Read each file into a data table using pipe(|) as delimeter
Read header row into string List
Create a table based on headers all of data type varchar(255)
Load Data table into new table using Bulk Copy.
public static string cstr = "Enter your connection string to destination here";
//if you are going to use script task inside SSIS then this is what goes into whatever the Main is in SSIS.
static void Main(string[] args)
{
var dir = @"C:\whereever";
foreach(var file in new DirectoryInfo(dir).GetFiles().AsEnumerable().Where(f=>f.Extension.Contains("txt")))
{
DataTable dt = ConvertCSVtoDataTable(file.FullName, '|');
//getColumnNames into an object
List<string> colNames = new List<string>();
foreach (DataColumn dc in dt.Columns)
colNames.Add(dc.ColumnName);
//create table
string tableName = Path.GetFileNameWithoutExtension(file.Name);
createTable(colNames, tableName);
//load table
loadDtIntoDb(dt, colNames, tableName);
}
}
public static void loadDtIntoDb(DataTable dt, List<string> cols, string tableName)
{
using (var conn = new SqlConnection(cstr))
{
using (var bcp = new SqlBulkCopy(conn))
{
bcp.DestinationTableName = tableName;
foreach (var col in cols)
bcp.ColumnMappings.Add(col, col);
conn.Open();
bcp.WriteToServer(dt);
}
}
}
public static void createTable(List<string> dcs,string tableName)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(string.Format("Create table {0}(", tableName));
foreach (var dc in dcs)
sb.AppendLine(string.Format("{0} varchar(255),", dc));
//Clean sql by removing final ',' and adding a close paren
string sql = sb.ToString().TrimEnd(',') + ")";
using (var conn = new SqlConnection(cstr))
{
using (var cmd = new SqlCommand(sql, conn))
{
conn.Open();
cmd.ExecuteNonQuery();
}
}
}
public static DataTable ConvertCSVtoDataTable(string strFilePath, char delim)
{
DataTable dt = new DataTable();
using (StreamReader sr = new StreamReader(strFilePath))
{
string[] headers = sr.ReadLine().Split(delim);
foreach (string header in headers)
{
dt.Columns.Add(header);
}
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(delim);
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
return dt;
}
Here is a list of my using statements:
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;
using System.Text;
using Microsoft.Data.SqlClient;
I see you are looking for efficiency. Here is how to do it asynchronously:
public static string cstr = "Enter your connection string to destination here";
//This cannot be done in SSIS
static async Task Main(string[] args)
{
var dir = @"C:\whereever";
var files = new DirectoryInfo(dir).GetFiles().AsEnumerable().Where(f=>f.Extension.Contains("txt")))
await Task.WhenAll(files.Select(async f=> await processFile(f)));
}
public statis async Task processFile(File file)
{
DataTable dt = await ConvertCSVtoDataTable(file.FullName, '|');
//getColumnNames into an object
List<string> colNames = new List<string>();
foreach (DataColumn dc in dt.Columns)
colNames.Add(dc.ColumnName);
//create table
string tableName = Path.GetFileNameWithoutExtension(file.Name);
await createTable(colNames, tableName);
//load table
await loadDtIntoDb(dt, colNames, tableName);
}
public static async Task loadDtIntoDb(DataTable dt, List<string> cols, string tableName)
{
using (var conn = new SqlConnection(cstr))
{
using (var bcp = new SqlBulkCopy(conn))
{
bcp.DestinationTableName = tableName;
foreach (var col in cols)
bcp.ColumnMappings.Add(col, col);
conn.Open();
await bcp.WriteToServerAsync(dt);
}
}
}
public static async Task createTable(List<string> dcs,string tableName)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine(string.Format("Create table {0}(", tableName));
foreach (var dc in dcs)
sb.AppendLine(string.Format("{0} varchar(255),", dc));
//Clean sql by removing final ',' and adding a close paren
string sql = sb.ToString().TrimEnd(',') + ")";
using (var conn = new SqlConnection(cstr))
{
using (var cmd = new SqlCommand(sql, conn))
{
conn.Open();
await cmd.ExecuteNonQueryAsync();
}
}
public static async Task<DataTable> ConvertCSVtoDataTable(string strFilePath, char delim)
{
DataTable dt = new DataTable();
using (StreamReader sr = new StreamReader(strFilePath))
{
string[] headers = await sr.ReadLineAsync().Split(delim);
foreach (string header in headers)
{
dt.Columns.Add(header);
}
while (!sr.EndOfStream)
{
string[] rows = await sr.ReadLineAsync().Split(delim);
DataRow dr = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
}
return dt;
}
Upvotes: 1
Reputation: 37358
In this answer, I will try to give some information and to provide a working example of importing several flat files with different metadata into separate SQL Server tables.
Since SSIS requires fixed metadata of the data source and destination, I suggest using a C# code rather than automating the SSIS package creation. If SSIS is required, you should use a Script Task to achieve that task.
In similar cases, I generally use C# or another scripting language such as python.
Importing more than 100 flat files to more than 100 tables that should be created at runtime (with no fixed structure) requires creating more than 100 SSIS packages. Even if this process is automated using one of the SSIS automation tools, it is hard to manage and debug a high number of SSIS packages.
If the destination tables structures mostly have similar structures, it is better to import all flat files into a unified SQL Server destination using a schema mapping approach, you can refer to the following answer:
All code provided in this solution can be improved, but it was written as a prototype
I created a class TextImport
that tries to loop over all flat files within a given directory, read the flat-files metadata and store each file data within a System.Data.DataTable
object. (Note that Microsoft.VisualBasic
assembly should be added as a reference to use the TextFieldParser
class)
TextImport txtimp = new TextImport(filename, true, 0); // TextImport(string filename, bool containsHeader, int RowsToSkip)
txtimp.BuildDataTableStructure(); //Read flat file metadata
DataTable dt = txtimp.FillDataTable(); //Convert the flat file to a DataTable object
The TextImport
class tries to detect the file metadata (number of columns, delimiter, text qualifiers, encodings) based on the first 8 lines (arbitrary number) in the flat file.
Then this System.Data.DataTable
is passed to another class called SQLExport
that generates and executes a CREATE TABLE
statement based on the DataTable
metadata.
SQLExport.CreateDestinationTable(DataTable dt);
Finally, two insert methods are provided:
INSERT INTO () ... VALUES (),(),()...
statement and execute it to insert data into SQL destinationSystem.Data.SqlClient.SqlBulkCopy
class to insert data using BULK INSERT
method.The main code should look like the following:
static void Main(string[] args)
{
//You should set your database connection string
string connectionstring = @"Data Source=.\SQLINSTANCE;Initial Catalog=tempdb;integrated security=SSPI;";
//This is the schema of the destination table
string Schema = "upload";
//You should set the text files directory
string directory = @"E:\TextFiles";
using (SQLExport sqlExp = new SQLExport(connectionstring, Schema))
{
//if you don't want to traverse subfolders use System.IO.SearchOption.TopDirectoryOnly
foreach (string filename in System.IO.Directory.GetFiles(directory,
"*.txt",System.IO.SearchOption.AllDirectories)){
using(TextImport txtimp = new TextImport(filename, true, 0))
{
txtimp.BuildDataTableStructure();
DataTable dt = txtimp.FillDataTable();
dt.TableName = System.IO.Path.GetFileName(filename);
sqlExp.CreateDestinationTable(dt);
//Insert using BULK INSERT
sqlExp.InsertUsingSQLBulk(dt);
//Creates and Execute an INSERT INTO statment
//sqlExp.InsertIntoDb(dt);
}
}
}
}
I tested this solution on two flat files stored on the following link (the first result of my Google search), as shown in the screenshots below, tables are created successfully, and data is inserted.
Note that all columns are created a NVARCHAR(255)
, you can change this within the SQLExport.cs
file
You can check the complete code in the following GitHub repository (.NET Framework 4.6.1 C# console application):
If you decided to use a Script Task to run this code, you should only copy the classes from the GitHub project into your Script Task project Remember to add Microsoft.VisualBasic
as a reference.
You can simply edit the C# Console application to pass the connection string and text files directory as arguments, then run it using a cmdExec
job step:
Or even by using an SSIS package:
Another alternative is to use Windows Task Scheduler to periodically run the C# console application.
Upvotes: 3
Reputation: 95950
I have posted this as a new answer, rather than editing my existing one, as the my original answer answers the original question.
Saying that, this answer simply expands on the existing answer. Unfortunately no details were given on what the data types of the data in the table should be, so I have assumed that all data will be an nvarchar(200)
. You can change that in the dynamic statement, however, note this solution assumes all data has the same data type. If this isn't true, and you need strongly typed data, it's time to ask a new question (don't be a chameleon).
Anyway, as I said, this expands on the existing answer from me. You'll note an additional section of code, this handles the creation of the table, dynamically.
I create a temporary table (which would be limited to the scope of the Invoke-Sqlcmd
is it created in), to INSERT
the first row from your file, which I assume has the headers. For the field terminator, I use a character I assume won't appear in your data, a pipe (|
). If they can, use a different character. I then only INSERT
the first row (hence FIRSTROW
and LASTROW
but having the value 1
). This INSERT
s the first row from the file as a delimited value.
Next I create a dynamic statement and split that value up into rows, and reaggregate, but quoting the field names, and defining the data type (nvarchar(200)
). You will need to use a string splitter that respects ordinal positions here, NOT STRING_SPLIT
. I use DelimitedSplitN4K_LEAD
(which is located in my Utility
database), and its definition can be found here.
Finally that dynamic statement is executed, created the table with the column names in the right order, all with the data type nvarchar(200)
. The Invoke-Sqlcmd
then executes that statement.
Then we go back to the original solution, and BULK INSERT
the data.
$ImportFolder = "/home/mssql/ImportSample" #The Folder your files are in
$Instance = "." #The Instance you are inserting the data into. "." means local host
$Database = "ImportSample" #The database you are inserting the data into
$Files = Get-ChildItem -Path $ImportFolder -Filter "*.txt" #Get all the txt files
#Loop the files
foreach($File in $Files){
#Create the creation statement
Write-Verbose "Defining table creation script for $($File.Name)."
$CreateSQL = "CREATE TABLE #Rows (DelimitedData nvarchar(4000));`n" +
"BULK INSERT #Rows FROM '$($File.FullName.Replace("'","''"))' WITH(FORMAT = 'CSV', FIELDTERMINATOR = '|', ROWTERMINATOR = '0x0a', FIRSTROW = 1, LASTROW = 1);`n" +
"DECLARE @SQL nvarchar(MAX);`n" +
"SELECT @SQL = N'CREATE TABLE dbo.[$($File.BaseName.Replace("]","]]"))] (' + STRING_AGG(QUOTENAME(DSL.Item) + N' nvarchar(200)',N',') WITHIN GROUP (ORDER BY DSL.ItemNumber) + N');' FROM #Rows R CROSS APPLY Utility.fn.DelimitedSplitN4K_LEAD(R.DelimitedData,N',') DSL;`n" +
"EXEC sys.sp_executesql @SQL;"
#Execute the statement
Write-Verbose "Table creation script for $($File.Name) generated:"
Write-Verbose $CreateSQL
Invoke-SqlCmd -ServerInstance $Instance -Database $Database -Query $CreateSQL -Username $User -Password $Password
#Create the Insert statement
Write-Verbose "Defining BULK INSERT script for $($File.Name)."
$InsertSQL = "BULK INSERT dbo.[$($File.BaseName.Replace("]","]]"))] FROM '$($File.FullName.Replace("'","''"))' WITH (FORMAT = 'CSV', FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', FIRSTROW = 2);"
Write-Verbose "BULK INSERT script for $($File.Name) generated:"
Write-Verbose $InsertSQL
#Execute the statement
Invoke-SqlCmd -ServerInstance $Instance -Database $Database -Query $InsertSQL -Username $User -Password $Password
}
Couple of screenshots demonstrating this working (as can't be demonstrated with a fiddle):
Upvotes: 1
Reputation: 95950
As I mentioned, I would actually use something like Powershell to do this. The script to do so is actually quite simple.
#Below is a Linux Path, as I am running SQL Server (and Powershell) on Linux
$ImportFolder = "/home/mssql/ImportSample" #The Folder your files are in
$Instance = "." #The Instance you are inserting the data into. "." means local host
$Database = "ImportSample" #The database you are inserting the data into
$Files = Get-ChildItem -Path $ImportFolder -Filter "*.txt" #Get all the txt files
#Loop the files
foreach($File in $Files){
#Create the statement
$SQL = "BULK INSERT dbo.[$($File.BaseName.Replace("]","]]"))] FROM '$($File.FullName.Replace("'","''"))' WITH (FORMAT = 'CSV', FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0a', FIRSTROW = 2);"
Write-Debug $SQL
#Execute the statement
Invoke-SqlCmd -ServerInstance $Instance -Database $Database -Query $SQL
}
This assumes Windows Authentication, however, if you are using SQL Authentication you can pass the -Username
and -Password
switches to Invoke-SqlCmd
.
Note: This answer was written before the necessity to CREATE
the tables was added. This does not meet the new requirement, however, I am leaving the answer here as it may still prove useful to others with a question similar to the original version of this question, without said requirement.
Upvotes: 2