Reputation: 31
Below given SSIS script task takes in a ticker symbol and the date range you would like to get back and returns a CSV formatted download that can be used to extract the price history but it does not work and I have no idea why. Full information about this SSIS well-thought-out concept can be found in the below link.
SSIS / ETL Example – Yahoo Equity & Mutual Fund Price History
You can download the sample SSIS package from the below link.
Following SSIS script task has no errors but it does not download a file:
I just started to understand this code, and it seemed that this link is subdivided to components like here, and with the correct values it must work but I don't understand why it is not retrieving the file.
ichart.finance.yahoo.com/table.csv?s={symbol}&a={startMM}&b={startDD}&c=
{startYYYY}&d={endMM}&e={endDD}&f={endYYYY}&g={res}&ignore=.csv
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Configuration;
using System.Collections.Generic;
using System.Data.Sql;
using System.Data.SqlClient;
using System.Net;
using System.Collections.Specialized;
using System.Linq;
using Hash = System.Collections.Generic.Dictionary<string, string>;
namespace ST_361aad0e48354b30b8152952caab8b2b.csproj
{
[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
#region VSTA generated code
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
static string dir;
static DateTime end;
const string CSV_FORMAT = "Id,Cusip,Date,Open,High,Low,Close,Volume,Adj Close";
public void Main()
{
// end date is today minus one day (end of day)
end = DateTime.Now;
// output directory stored in SSIS variable
// which can be set at runtime
dir = System.IO.Path.Combine(Dts.Variables["OutputCSV"].Value.ToString(), end.ToString("yyyyMMdd"));
if (!System.IO.Directory.Exists(dir))
System.IO.Directory.CreateDirectory(dir);
// connection string to our database
var connectionString = Dts.Variables["ConnectionString"].Value.ToString();
// the sql command to execute
var sql = Dts.Variables["PriceHistorySqlCommand"].Value.ToString();
var list = new List<Hash>();
using (var cnn = new SqlConnection(connectionString))
{
cnn.Open();
using (var cmd = new SqlCommand(sql, cnn))
{
cmd.CommandTimeout = 0;
var dr = cmd.ExecuteReader();
while (dr.Read())
{
// store result in temporary hash
var h = new Hash();
h["cusip"] = dr["cusip"].ToString();
h["symbol"] = dr["symbol"].ToString();
h["product_id"] = dr["product_id"].ToString();
h["last_price_dt_id"] = dr["last_price_dt_id"].ToString();
list.Add(h);
// process batches of 100 at a time
// (This requires System.Threading.dll (CTP of parallel extensions) to be installed in the GAC)
if (list.Count >= 100)
{
System.Threading.Tasks.Parallel.ForEach(list, item =>
{
var dt = item["last_price_dt_id"].TryGetDateFromDateDimensionId(end.AddYears(-100));
DownloadPriceHistory(item["product_id"], item["cusip"], item["symbol"], dt);
});
list.Clear();
}
}
}
}
// TODO: Add your code here
Dts.TaskResult = (int)ScriptResults.Success;
}
static void DownloadPriceHistory(string id, string cusip, string symbol, DateTime begin)
{
// get write path
var path = System.IO.Path.Combine(dir, cusip + ".csv");
var url = String.Format("http://ichart.finance.yahoo.com/table.csv?s={0}&d={1}&e={2}&f={3}&g=d&a={4}&b={5}&c={6}&ignore=.csv",
symbol.ToUpper(),
(end.Month - 1).ToString("00"), end.Day.ToString("00"), end.Year,
(begin.Month - 1).ToString("00"), begin.Day.ToString("00"), begin.Year);
string csv;
using (WebClient web = new WebClient())
{
try
{
var text = web.DownloadString(url);
var lines = text.Split('\n');
System.Text.StringBuilder sb = new System.Text.StringBuilder();
int i = 0;
foreach (var line in lines)
{
// skip first line its a header
if (i == 0)
sb.AppendLine(CSV_FORMAT);
// ensure line being added is not null
else if (false == String.IsNullOrEmpty(line) && false == String.IsNullOrEmpty(line.Trim()))
sb.AppendLine(id + "," + cusip + "," + line);
i++;
}
// add header and body
csv = sb.ToString();
}
catch (System.Net.WebException)
{
// 404 error
csv = CSV_FORMAT;
}
}
System.IO.File.WriteAllText(path, csv);
}
}
/// <summary>
/// Some simple extension methods.
/// </summary>
public static class ExtensionMethods
{
/// <summary>
/// Gets a datetime object from a dimension id string for example '20090130' would be translated to
/// a proper datetime of '01-30-2009 00:00:00'. If the string is empty than we default to the passed
/// in <paramref name="defaultIfNull"/>.
/// </summary>
/// <param name="str">The string</param>
/// <param name="defaultIfNull">The default null.</param>
/// <returns>Returns the datetime.</returns>
public static DateTime TryGetDateFromDateDimensionId(this string str, DateTime defaultIfNull)
{
if (String.IsNullOrEmpty(str)) return defaultIfNull;
return DateTime.Parse(str.Substring(4, 2) + "/" + str.Substring(6, 2) + "/" + str.Substring(0, 4));
}
}
}
Upvotes: 2
Views: 2296
Reputation:
There is another way to import the ticker symbol price history from Yahoo Chart website into database using SSIS. Here is a sample package written using SSIS 2008 R2
with database in SQL Server 2008 R2
Create an SSIS package named (say SO_14797886.dtsx
) using Business Intelligence Development Studio (BIDS)
and create an OLE DB
connection manager/data source that connects to your database. This sample uses the data source OLEDB_Sora.ds
that connects to the database Sora
on my local machine running the instance KIWI\SQLSERVER2008R2
. KIWI
is the machine name and SQLSERVER2008R2
is the instance name.
Execute the below given script in the database to create two tables.
Table dbo.TickerSymbols
will hold the information about list of ticker symbols and the start and end dates for which you would like to import the price files along with the resolution of the import. Resolution can contain values like d
for day
; w
for weekly
; m
for monthly
; and y
for yearly
.
Table dbo.TickerPriceHistory
will hold the price history information of the symbols downloaded from Yahoo Finance Chart website.
Insert script has added four records for the ticker symbols AAPL
(Apple
); MSFT
(Microsoft
); GOOG
(Google
); and YHOO
(Yahoo
). Each record is set with different date ranges and resolution.
CREATE TABLE dbo.TickerSymbols
(
Id int IDENTITY(1,1) NOT NULL
, Symbol varchar(10) NOT NULL
, StartDate datetime NOT NULL
, EndDate datetime NOT NULL
, Resolution char(1) NOT NULL
, CONSTRAINT [PK_TickerSymbols] PRIMARY KEY CLUSTERED ([Id] ASC)
);
GO
CREATE TABLE dbo.TickerPriceHistory
(
Id int IDENTITY(1,1) NOT NULL
, Symbol varchar(10) NOT NULL
, PriceDate datetime NOT NULL
, PriceOpen numeric(18,2) NULL
, PriceHigh numeric(18,2) NULL
, PriceLow numeric(18,2) NULL
, PriceClose numeric(18,2) NULL
, Volume bigint NULL
, AdjustmentClose numeric(18,2) NULL
, CONSTRAINT [PK_TickerPriceHistory] PRIMARY KEY CLUSTERED ([Id] ASC)
);
GO
INSERT INTO dbo.TickerSymbols (Symbol, StartDate, EndDate, Resolution) VALUES
('AAPL', '2012-02-01', '2012-02-04', 'd')
, ('GOOG', '2013-01-01', '2013-01-31', 'w')
, ('MSFT', '2012-09-01', '2012-11-30', 'm')
, ('YHOO', '2012-01-01', '2012-12-31', 'y')
;
GO
On the SSIS package, create the following variables.
EndDate
: The package will use this variable of data type DateTime
to hold the end date of the symbol being looped through in the record set list.
FileExtension
: This variable of data type String
will hold the file extension to use for the downloaded files. This is optional.
FileName
: This variable of data type String
will hold the name of the file for a given symbol. The name is generated based on timestamp to avoid overwriting previously downloaded files. Click the variable and press F4 to view properties. Change the property EvaluateAsExpression
to True
. Click on the Ellipsis button against Expression to open the Expression Builder. Set the Expression
to the following value. This expression will evaluate to value like MSFT_20130210_092519.csv
, where MSFT is the symbol and the rest of the information is package start time in the format yyyMMdd_hhmmss
and .csv
is the file extension.
@[User::Symbol] + "_" + (DT_WSTR, 4) YEAR(@[System::StartTime]) + RIGHT("00" + (DT_WSTR, 2) MONTH(@[System::StartTime]), 2) + RIGHT("00" + (DT_WSTR, 2) DAY(@[System::StartTime]), 2) + "_" + RIGHT("00" + (DT_WSTR, 2) DATEPART("hh", @[System::StartTime]), 2) + RIGHT("00" + (DT_WSTR, 2) DATEPART("mi", @[System::StartTime]), 2) + RIGHT("00" + (DT_WSTR, 2) DATEPART("ss", @[System::StartTime]), 2) + @[User::FileExtension]
FilePath
: This variable of data type String
will hold the complete path of the downloaded file for a given symbol. Click the variable and press F4 to view properties. Change the property EvaluateAsExpression
to True
. Click on the Ellipsis button against Expression to open the Expression Builder. Set the Expression
to the value @[User::RootFolder] + "\\" + @[User::FileName]
. We will use this express to
Resolution
: The package will use this variable of data type String
to hold the reolution information of the symbol being looped through in the record set list.
RootFolder
: This variable of data type String
will hold the root folder where the files should be downloaded to.
SQL_GetSymbols
: This variable of data type String
will contain the T-SQL query to fetch the ticker symbols information from database. Set the value to SELECT Symbol, StartDate, EndDate, Resolution FROM dbo.TickerSymbols
StartDate
: The package will use this variable of data type DateTime
to hold the start date of the symbol being looped through in the record set list.
Symbol
: The package will use this variable of data type String
to hold the ticker symbol as it loops through each record in the record set list.
SymbolsList
: The package will use this variable of data type Object
to hold the result set of ticker symbols stored in the database.
URLYahooChart
: This variable of data type String
will hold the URL to Yahoo Finance Chart website with place holders to fill in the appropriate values for query string. Set the value to http://ichart.finance.yahoo.com/table.csv?s={0}&a={1}&b={2}&c={3}&d={4}&e={5}&f={6}&g={7}&ignore=.csv
On the package, right-click on the Connection Managers tab and click Flat File Connection...
On the General
page of Flat File Connection Manager Editor, perform the following actions:
Set the Name to FILE_TickerPriceHistory
Set the Description to Read the ticker symbol price history.
If you already have a sample file, point to the file location. SSIS will infer the settings from the data in the file. In this case, I already downloaded a file by navigating the URL http://ichart.finance.yahoo.com/table.csv?s=MSFT&a=9&b=1&c=2012&d=11&e=30&f=2012&g=m&ignore=.csv
and saved it under the name C:\Siva\StackOverflow\Files\14797886\Data\\MSFT_20130210_092519.csv
Make sure the Format is set to Delimited
.
Make sure the Header row delimiter is set to {CR}{LF}
Check the box Column names in the first data row
Click Columns page
On the Columns
page of Flat File Connection Manager Editor, make sure that the Row delimiter is set to {LF}
and Column delimiter is set to Comma {,}
. Click Advanced page.
On the Advanced
page of Flat File Connection Manager Editor, the columns will be created based on the file information. Change the values as shown below so that column names match with names in the database. This way the column mapping will be easier. All columns except the last column should have the ColumnDelimiter set to Comma {,}
. Lsst column should have the the ColumnDelimiter set to {LF}
.
Column Data type DataPrecision DataScale
------------------- ------------------------------------ ------------- ---------
PriceDate date [DT_DATE]
PriceOpen numeric [DT_NUMERIC] 18 2
PriceHigh numeric [DT_NUMERIC] 18 2
PriceLow numeric [DT_NUMERIC] 18 2
PriceClose numeric [DT_NUMERIC] 18 2
Volume eight-byte unsigned integer [DT_UI8]
AdjustmentClose numeric [DT_NUMERIC] 18 2
You should now see both the connection managers at the bottom of the package.
Drag and drop an Execute SQL Task
on to the Control Flow tab and perform the following actions on the General tab.
Get symbols from database
Fetch the list of symbols and its download settings from database.
Full result set
because the query will return a record set.OLE DB
OLEDB_Sora
Variable
from SQLSourceTypeUser::SQL_GetSymbols
from SourceVariableOn the Result Set page of Execute SQL Task, click Add and set Result Name to 0
indicating the index of the result set. Select User::SymbolsList
from the Variable Name to store the result set into object variable.
Drag and drop a Foreach Loop Container and place it after the Execute SQL Task. Connect the Execute SQL Task green arrow to Foreach Loop Container. Double-click Foreach Loop Container to view Foreach Loop Editor. Configure the ForEach Loop Editor as shown below.
On the Variable Mappings page of ForEach Loop Editor, configure it as shown below:
Drag and drop a Script Task
inside the ForEach Loop Container. Double-click the Script Task to open the Script Task Editor. On the Script page of the Script Task editor, click the Ellipsis button against the ReadOnlyVariables
and select the below listed variables. We need to use these inside the Script Task code.
Click the Edit Script...
button on the Script Task Editor and type the below code. After typing the code, close the Script Task Editor.
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Net;
namespace ST_5fa66fe26d20480e8e3258a8fbd16683.csproj
{
[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
#region VSTA generated code
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
public void Main()
{
try
{
string symbol = Dts.Variables["User::Symbol"].Value.ToString();
DateTime startDate = Convert.ToDateTime(Dts.Variables["User::StartDate"].Value);
DateTime endDate = Convert.ToDateTime(Dts.Variables["User::EndDate"].Value);
string resolution = Dts.Variables["User::Resolution"].Value.ToString();
string urlYahooChart = Dts.Variables["User::URLYahooChart"].Value.ToString();
string rootFolder = Dts.Variables["User::RootFolder"].Value.ToString();;
string fileExtension = Dts.Variables["User::FileExtension"].Value.ToString();
string fileName = Dts.Variables["User::FileName"].Value.ToString();
string downloadPath = Dts.Variables["User::FilePath"].Value.ToString();
if (!System.IO.Directory.Exists(rootFolder))
System.IO.Directory.CreateDirectory(rootFolder);
urlYahooChart = string.Format(urlYahooChart
, symbol
, startDate.Month
, startDate.Day
, startDate.Year
, endDate.Month
, endDate.Day
, endDate.Year
, resolution);
bool refire = false;
Dts.Events.FireInformation(0, string.Format("Download URL of {0}", symbol), urlYahooChart, string.Empty, 0, ref refire);
WebClient webClient = new WebClient();
webClient.DownloadFile(urlYahooChart, downloadPath);
Dts.TaskResult = (int)ScriptResults.Success;
}
catch (Exception ex)
{
Dts.Events.FireError(0, "Download error", ex.ToString(), string.Empty, 0);
}
}
}
}
Drag and drop a Data Flow Task
inside the Foreach Loop Container after the Script Task. Connect the green arrow from Script Task to the Data Flow Task. The Control Flow tab should look as shown below.
On the Data Flow Task, drag and drop Flat File Source and configure it as shown below to read the price history CSV files.
Drag and drop a Derived Column Transformation
and create a new column named Symbol
with the expression (DT_STR,10,1252)@[User::Symbol]
to add the Symbol to the data pipeline.
Drag and drop OLE DB Destination and configure it as shown below to insert the data into the database.
Your Data Flow tab should look like as shown below:
Before running the package, we need to make couple of changes to prevent any warnings or errors on the design time view due to the absence of the files in the folder.
Click the flat file connection manager FILE_TickerPriceHistory
and press F4
to view the properties. Change the property DelayValidation
to True
. This will make sure that the validation of file existence will happen during runtime. Click the Ellipsis button against the Expression and set the ConnectionString
property to the value @[User::FilePath]
. This will change the file path as each file is being downloaded from the website.
Click the Data Flow Task and press F4
to view the properties. Change the property DelayValidation
to True
. This will make sure that the validation of file existence will happen during runtime.
Navigate to the Data Flow tab and click the Flat File Source and press F4
to view the properties. Change the property ValidateExternalMetadata
to False
. This will make sure that the validation of flat file existence will happen during runtime.
Let us navigate to the folder C:\Siva\StackOverflow\Files\14797886
, where the downloaded files will be saved is empty. The folder does not have to be empty. This is just for execution check.
Run the following SQL statements against the database to verify the data in the table. The second table should be empty.
SELECT * FROM dbo.TickerSymbols;
SELECT * FROM dbo.TickerPriceHistory;
Execute the package. If everything is set up correctly, the package should run successfully and download the files for each symbol listed in table dbo.TickerSymbols
The files should be successfully saved to the folder C:\Siva\StackOverflow\Files\14797886
. Notice that each file is named appropriately based on the expressions provided in the package.
Run the following SQL statement against the database to verify the data in the table. The table dbo.TickerPriceHistory
should now have the data from the price files downloaded from website.
SELECT * FROM dbo.TickerPriceHistory;
The above sample package illustrated how to download price files from Yahoo Finance Chart website for a given list of ticker symbols and load them into the database.
Upvotes: 15