Bruce Long
Bruce Long

Reputation: 733

OPENROWSET or OPENDATASOURCE Example to get Data from .xlsx or .csv file on file system of Remote Server

There are plenty of examples around showing how to use OPENROWSET and OPENDATASOURCE to open queries against tables on remote SQL servers, and to get data from files on other servers. Unfortunately, none of them tell me how to do the specific thing that I need to do, which is use one of these commands to get either a .csv or .xlsx Excel file from a remote server using the server IP address and windows login where the server has no SQL Server instance installed - only either IIS7 or 8.

It would be great if Microsoft documentation didn't omit basic examples of how to use their tools in the most likely of ways. It takes no real intelligence to try 40 odd different combinations of parameters list: it's just an inefficient waste of time. Developers have more important things to do than waste hours trying to discover some 'secret knowledge' recipe which is really just a not-properly-documented variant of the command parameters, and pretending that this is something to do with being clever.

So to assuage my philosophical angst, could someone please provide an example of how to use either OPENROWSET or OPENDATASOURCE to get/select content from either a .csv file or an excel spreadsheet using the remote server IP address XXX.XXX.XXX.XXX, a port number if it should be included, the file system path correctly appended to that or included in the command, and the correct arrangement of the username and password parameters. There is no SQL server instance on the remote server - just IIS and a website with windows auth. If you have an example that works with SQL server instance on the remote server, that will do (although I think some of the REMOTEDATASOURCE examples cover this already) but I REALLY want an example where I don't have to have an SQL server instance on the remote web server.

I know that you can use linked servers for this, but there is some extra baggage to put on the remote server containing the file to access, and in my case the server containing the excel or .csv text file will not even have a SQL Server instance on it - only IIS and a website.

(Also - can someone confirm whether you can use linked servers with NO SQL Server or other database server instance on the remote server with the desired text data file? Is it linked database servers only, or linked servers where the remote server can just be a windows and web server with no SQL server installed on it?)

This example is handy, but does not tell me if SERVERPROPERTY('MachineName') AS nvarchar(128)) can contain an IP address of a remote windows server with no SQL server instance on it, and it is not about accessing text files.

This example comes blisteringly close, but there does not seem to be a positive outcome in the post, and there is no text file involved - only a DB query so the remote system has SQL server installed?:

SELECT * 
FROM OPENROWSET('PACISOleDb', '192.168.200.172';'admin';'admin', 'SELECT * FROM DB')

Existing examples that are good but don't help much:

SELECT *
FROM OPENROWSET('SQLNCLI',
   'DRIVER={SQL Server};SERVER=YourServer;UID=UserID;PWD=Password',
   'select * from sys.sysobjects')

(Source)

SELECT ContactName, CompanyName, ContactTitle
FROM OPENDATASOURCE('SQLOLEDB',
              'Data Source=ServerName;User ID=MyUID;Password=MyPass' ).Northwind.dbo.Customers

(source)

-- SQL Server 2012
SELECT
    *
FROM
    OPENDATASOURCE ('SQLNCLI11', 
   'Data Source=SQLInstanceName;Catalog=DBName;User ID=SQLLogin;Password=Password;').DBName.SchemaName.TableName


SELECT *
FROM OPENROWSET('SQLNCLI11',
   'DRIVER={SQL Server};SERVER=SQLInstanceName;UID=SQLLogin;PWD=Password',
   'select * from DBName..TableName')  

--Access DB
SELECT * FROM OPENDATASOURCE ('Microsoft.ACE.OLEDB.12.0', 
                              'Data Source=D:\MyDB\MyAccessDB.accdb')...TableName   

(source)

SELECT *
FROM OPENROWSET('SQLNCLI',
   'DRIVER={SQL Server};SERVER=MyServer;UID=MyUserID;PWD=MyCleverPassword',
   'select @@ServerName') 

(source. I take it that {SQL Server} here has to be a SQL server name - not a remote IP Address? It is not clear.)

SELECT *
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
                'Excel 8.0;Database=C:\test.xls;HDR=No',
                'SELECT * FROM [Sheet1$]') b

(source)

--Excel 2007-2010
SELECT * --INTO #productlist
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
    'Excel 12.0 Xml;HDR=YES;Database=C:\temp\Products.xlsx',
    'SELECT * FROM [ProductList$]');

--Excel 97-2003
SELECT * --INTO #productlist
FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
    'Excel 8.0;HDR=YES;Database=C:\temp\Products.xls',
    'select * from [ProductList$]');

(source. Again - no remote server, no IP server address, no credentials)

Lots to chose from here. GREAT blog post, but not useful for my specific ends:

SELECT * 
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
  'Excel 12.0 Xml;
   Database=C:\DataFiles\EmployeeData1.xlsx',
   [vEmployee$]); 

SELECT * 
FROM OPENDATASOURCE('Microsoft.ACE.OLEDB.12.0',
  'Data Source=C:\DataFiles\EmployeeData1.xlsx;
   Extended Properties=Excel 12.0 Xml')...[vEmployee$];

SELECT * 
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
  'Excel 12.0 Xml; HDR=YES;
   Database=C:\DataFiles\EmployeeData1.xlsx',
   [vEmployee$]);

SELECT * 
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
  'Excel 12.0 Xml; HDR=NO;
   Database=C:\DataFiles\EmployeeData1.xlsx',
   [vEmployee$]);

SELECT * INTO EmployeeInfo3
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
  'Excel 12.0 Xml; HDR=YES; IMEX=1;
   Database=C:\DataFiles\EmployeeData1.xlsx',
   [vEmployee$]);

(source)

The MS openrowset documentation has an '|' in the command schema suggesting this cannot be done remotely:

SELECT *
FROM OPENROWSET(BULK N'D:\XChange\test-csv.csv',
    FORMATFILE = N'D:\XChange\test-csv.fmt', 
    FIRSTROW=2, 
    FORMAT='CSV') AS cars;  

H. Accessing data from a CSV file without a format file:

    tsql
     Copy
    SELECT * FROM OPENROWSET(
       BULK 'C:\Program Files\Microsoft SQL Server\MSSQL14.CTP1_1\MSSQL\DATA\inv-2017-01-19.csv',
       SINGLE_CLOB) AS DATA;

So I am thinking that the correct OR ONLY approach involves OPENDATASOURCE with the jet driver present and accessible on the remote server. However, where in the following example from MS does the IP address of the remote server and the login password and username go? If it cannot then that seems contradictory to the claims of the capability of the command in the documentation (based on the words they use), and it would be great if they could say 'you cannot do this', as it is pretty clear that most people will try...

SELECT * FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',  
'Data Source=C:\DataFolder\Documents\TestExcel.xls;Extended Properties=EXCEL 5.0')...[Sheet1$] ;

So,

Something like:

SELECT * FROM OPENDATASOURCE('Microsoft.ACE.OLEDB.12.0'|'Microsoft.Jet.OLEDB.4.0'|'PACISOleDb', 'Data Source=XXX.XXX.XXX.XXX\DataFolder\Documents\TestExcel.xls';User ID=MyUID;Password=MyPass;Extended Properties=EXCEL 5.0')...[Sheet1$] ;

or maybe

(TWIDDLING THUMBS - BORING BORING BORING)

I should give up and use a linked server:

EXEC sp_addlinkedserver
  @server = 'ExcelLinkSrv1',
  @srvproduct = 'Excel', 
  @provider = 'Microsoft.ACE.OLEDB.12.0',
  @datasrc = 'C:\DataFiles\EmployeeData1.xlsx',
  @provstr = 'Excel 12.0 Xml; HDR=YES; IMEX=1';
GO

But again - can I have:

@server = '202.217.XXX.XXX'

?

Yes - I know that you would normally not hard code it, but let's start simple in SSMS. I wanted to avoid linked server for different reasons. How do you do it with OPENROWSET or else OPENDATASOURCE against excel spreadsheet or else .csv file?

You can have all of my reputation points or whatever they are if your solution or information works (and someone else did not get them first), because I don't care about that stuff.

Upvotes: 12

Views: 33261

Answers (4)

Coruscate5
Coruscate5

Reputation: 2543

An Excel connection string has specific requirements, which are based on connection DLL string mappings based on the provider you have selected (ACE/JET). This then is translated to behavior defined in the registry, which is difficult to parse.

The underlying technology used to interact with your provided string is OLE Data Access which is of course COM based.

As others pointed out, you may use a sharepath, but I wanted to provide more detail.

Information about the internals of these providers is difficult to come by, and official MSFT documentation on the matter is no longer directly accessible online (I would imagine there is an archive somewhere on Excel Data Sources in one giant PDF, but I was unable to locate it). COM-based reading material is generally all been deprecated or removed, despite its integral role in the Windows OS.

I share your sentiment about these commands. If it helps at all, here are the details of the "secret" commands within OPENDATASOURCE, taken from https://www.codeproject.com/Articles/37055/Working-with-MS-Excel-xls-xlsx-Using-MDAC-and-Oled (at risk of being lost as well):

The connectionstring has some parts:

Provider: It is the main oledb provider that is used to open the Excel sheet. This will be Microsoft.Jet.OLEDB.4.0 for Excel 97 onwards Excel file format and Microsoft.ACE.OLEDB.12.0 for Excel 2007 or higher Excel file format (One with xlsx extension)

Data Source: It is the entire path of the Excel workbook. You need to mention a dospath that corresponds to an Excel file. Thus, it will look like: Data Source=C:\testApp.xls".

Extended Properties (Optional): Extended properties can be applied to Excel workbooks which may change the overall activity of the Excel workbook from your program. The most common ones are the following:

HDR: It represents Header of the fields in the Excel table. Default is YES. If you don't have fieldnames in the header of your worksheet, you can specify HDR=NO which will take the columns of the tables that it finds as f1,f2 etc.

ReadOnly: You can also open Excel workbook in readonly mode by specifying ReadOnly=true; By default, Readonly attribute is false, so you can modify data within your workbook.

FirstRowHasNames: It is the same as HDR, it is always set to 1 ( which means true) you can specify it as false if you don't have your header row. If HDR is YES, provider disregards this property. You can change the default behaviour of your environment by changing the Registry Value [HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\FirstRowHasNames] to 00 (which is false)

MaxScanRows: Excel does not provide the detailed schema defination of the tables it finds. It need to scan the rows before deciding the data types of the fields. MaxScanRows specifies the number of cells to be scanned before deciding the data type of the column. By default, the value of this is 8. You can specify any value from 1 - 16 for 1 to 16 rows. You can also make the value to 0 so that it searches all existing rows before deciding the data type. You can change the default behaviour of this property by changing the value of [HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows] which is 8 by default. Currently, MaxScanRows is ignored, so you need only to depend on TypeGuessRows Registry value. Hope Microsoft fixes this issue to its later versions.

IMEX: (A Caution) As mentioned above, Excel will have to guess a number or rows to select the most appropriate data type of the column, a serious problem may occur if you have mixed data in one column. Say you have data of both integer and text on a single column, in that case, Excel will choose its data type based on majority of the data. Thus it selects the data for the majority data type that is selected, and returns NULL for the minority data type. If the two types are equally mixed in the column, the provider chooses numeric over text.

For example, in your eight (8) scanned rows, if the column contains five (5) numeric values and three (3) text values, the provider returns five (5) numbers and three (3) null values.

To work around this problem for data, set "IMEX=1" in the Extended Properties section of the connection string. This enforces the ImportMixedTypes=Text registry setting. You can change the enforcement of type by changing [HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\ImportMixedTypes] to numeric as well.

Thus if you look into the simple connectionstring with all of them, it will look like:

    Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\\testexcel.xls;
    Extended Properties=\"Excel 8.0;HDR=YES;IMEX=1;MAXSCANROWS=15;READONLY=FALSE\""

    or:
    Copy Code

    Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\\testexcel.xlsx;
    Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1;MAXSCANROWS=15;READONLY=FALSE\""

We need to place extended properties into Quotes(") as there are multiple number of values.

Upvotes: 1

John
John

Reputation: 9

Here is a simple SQL query for CSV import using opendatasource and the access text driver 64 bit

SELECT * FROM OPENDATASOURCE('Microsoft.ACE.OLEDB.12.0',
                             'Text;Database=D:\TEMP;HDR=YES')..[pdf_attachment#csv]

Upvotes: 0

marktwo
marktwo

Reputation: 481

If you want to link to csv files, Erland Sommarskog has a very good article: here, which discusses the practical use of OPENROWSET (BULK) , as well as BCP and BULK IMPORT . As he points out in detail, there are number of gotchas, especially if you are stuck with a version of SQL Server before 2017.

The layout of the csv file may not allow you to use OPENROWSET (BULK) without some prior processing. The article gives full details. If it is possible, however, it is by far the fastest method for gulping large volumes of text into a SQL Server database.

Upvotes: 0

Matias Sincovich
Matias Sincovich

Reputation: 52

Simply use windows sharepaths for that.

EXEC sp_addlinkedserver
@server = 'ExcelLinkSrv1',
@srvproduct = 'Excel', 
@provider = 'Microsoft.ACE.OLEDB.12.0',
@datasrc = '\\SERVEROTHER\c$\DataFiles\EmployeeData1.xlsx',
@provstr = 'Excel 12.0 Xml; HDR=YES; IMEX=1';

Upvotes: 2

Related Questions