Stifler
Stifler

Reputation: 31

How to fix the Script Task code that downloads ticker price files from Yahoo and inserts into database?

Below given SSIS script task takes in a ticker symbol and the date range you would like to get back and returns a CSV formatted download that can be used to extract the price history but it does not work and I have no idea why. Full information about this SSIS well-thought-out concept can be found in the below link.

SSIS / ETL Example – Yahoo Equity & Mutual Fund Price History

You can download the sample SSIS package from the below link.

Sample package on SkyDrive

Following SSIS script task has no errors but it does not download a file:

I just started to understand this code, and it seemed that this link is subdivided to components like here, and with the correct values it must work but I don't understand why it is not retrieving the file.

ichart.finance.yahoo.com/table.csv?s={symbol}&a={startMM}&b={startDD}&c=
{‌​startYYYY}&d={endMM}&e={endDD}&f={endYYYY}&g={res}&ignore=.csv

Script task code that I am using:

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Configuration;
using System.Collections.Generic;
using System.Data.Sql;
using System.Data.SqlClient;
using System.Net;
using System.Collections.Specialized;
using System.Linq;

using Hash = System.Collections.Generic.Dictionary<string, string>;

namespace ST_361aad0e48354b30b8152952caab8b2b.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {

        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        static string dir;
        static DateTime end;
        const string CSV_FORMAT = "Id,Cusip,Date,Open,High,Low,Close,Volume,Adj Close";

        public void Main()
        {
            // end date is today minus one day (end of day)
            end = DateTime.Now;

            // output directory stored in SSIS variable
            // which can be set at runtime
            dir = System.IO.Path.Combine(Dts.Variables["OutputCSV"].Value.ToString(), end.ToString("yyyyMMdd"));
            if (!System.IO.Directory.Exists(dir))
                System.IO.Directory.CreateDirectory(dir);

            // connection string to our database
            var connectionString = Dts.Variables["ConnectionString"].Value.ToString();

            // the sql command to execute
            var sql = Dts.Variables["PriceHistorySqlCommand"].Value.ToString();

            var list = new List<Hash>();
            using (var cnn = new SqlConnection(connectionString))
            {
                cnn.Open();
                using (var cmd = new SqlCommand(sql, cnn))
                {
                    cmd.CommandTimeout = 0;
                    var dr = cmd.ExecuteReader();
                    while (dr.Read())
                    {
                        // store result in temporary hash
                        var h = new Hash();
                        h["cusip"] = dr["cusip"].ToString();
                        h["symbol"] = dr["symbol"].ToString();
                        h["product_id"] = dr["product_id"].ToString();
                        h["last_price_dt_id"] = dr["last_price_dt_id"].ToString();

                        list.Add(h);

                        // process batches of 100 at a time 
                        // (This requires System.Threading.dll (CTP of parallel extensions) to be installed in the GAC)
                        if (list.Count >= 100)
                        {
                            System.Threading.Tasks.Parallel.ForEach(list, item =>
                            {
                                var dt = item["last_price_dt_id"].TryGetDateFromDateDimensionId(end.AddYears(-100));
                                DownloadPriceHistory(item["product_id"], item["cusip"], item["symbol"], dt);
                            });
                            list.Clear();
                        }
                    }
                }
            }


            // TODO: Add your code here
            Dts.TaskResult = (int)ScriptResults.Success;
        }

        static void DownloadPriceHistory(string id, string cusip, string symbol, DateTime begin)
        {
            // get write path
            var path = System.IO.Path.Combine(dir, cusip + ".csv");
            var url = String.Format("http://ichart.finance.yahoo.com/table.csv?s={0}&d={1}&e={2}&f={3}&g=d&a={4}&b={5}&c={6}&ignore=.csv",
                symbol.ToUpper(),
                (end.Month - 1).ToString("00"), end.Day.ToString("00"), end.Year,
                (begin.Month - 1).ToString("00"), begin.Day.ToString("00"), begin.Year);
            string csv;

            using (WebClient web = new WebClient())
            {
                try
                {
                    var text = web.DownloadString(url);
                    var lines = text.Split('\n');
                    System.Text.StringBuilder sb = new System.Text.StringBuilder();
                    int i = 0;
                    foreach (var line in lines)
                    {
                        // skip first line its a header
                        if (i == 0)
                            sb.AppendLine(CSV_FORMAT);
                        // ensure line being added is not null
                        else if (false == String.IsNullOrEmpty(line) && false == String.IsNullOrEmpty(line.Trim()))
                            sb.AppendLine(id + "," + cusip + "," + line);
                        i++;
                    }
                    // add header and body
                    csv = sb.ToString();
                }
                catch (System.Net.WebException)
                {
                    // 404 error
                    csv = CSV_FORMAT;
                }
            }

            System.IO.File.WriteAllText(path, csv);
        }
    }

    /// <summary>
    /// Some simple extension methods.
    /// </summary>
    public static class ExtensionMethods
    {
        /// <summary>
        /// Gets a datetime object from a dimension id string for example '20090130' would be translated to
        /// a proper datetime of '01-30-2009 00:00:00'. If the string is empty than we default to the passed
        /// in <paramref name="defaultIfNull"/>.
        /// </summary>
        /// <param name="str">The string</param>
        /// <param name="defaultIfNull">The default null.</param>
        /// <returns>Returns the datetime.</returns>
        public static DateTime TryGetDateFromDateDimensionId(this string str, DateTime defaultIfNull)
        {
            if (String.IsNullOrEmpty(str)) return defaultIfNull;
            return DateTime.Parse(str.Substring(4, 2) + "/" + str.Substring(6, 2) + "/" + str.Substring(0, 4));
        }
    }
}

Upvotes: 2

Views: 2296

Answers (1)

user756519
user756519

Reputation:

Import ticker price history from Yahoo Finance Chart website using SSIS:

There is another way to import the ticker symbol price history from Yahoo Chart website into database using SSIS. Here is a sample package written using SSIS 2008 R2 with database in SQL Server 2008 R2

Create an SSIS package named (say SO_14797886.dtsx) using Business Intelligence Development Studio (BIDS) and create an OLE DB connection manager/data source that connects to your database. This sample uses the data source OLEDB_Sora.ds that connects to the database Sora on my local machine running the instance KIWI\SQLSERVER2008R2. KIWI is the machine name and SQLSERVER2008R2 is the instance name.

Execute the below given script in the database to create two tables.

  • Table dbo.TickerSymbols will hold the information about list of ticker symbols and the start and end dates for which you would like to import the price files along with the resolution of the import. Resolution can contain values like d for day; w for weekly; m for monthly; and y for yearly.

  • Table dbo.TickerPriceHistory will hold the price history information of the symbols downloaded from Yahoo Finance Chart website.

  • Insert script has added four records for the ticker symbols AAPL (Apple); MSFT (Microsoft); GOOG (Google); and YHOO (Yahoo). Each record is set with different date ranges and resolution.

Script to create tables and insert few ticker symbols data:

CREATE TABLE dbo.TickerSymbols
(
        Id int IDENTITY(1,1) NOT NULL
    ,   Symbol varchar(10) NOT NULL
    ,   StartDate datetime NOT NULL
    ,   EndDate datetime NOT NULL
    ,   Resolution char(1) NOT NULL
    ,   CONSTRAINT [PK_TickerSymbols] PRIMARY KEY CLUSTERED ([Id] ASC)
);
GO

CREATE TABLE dbo.TickerPriceHistory
(
        Id int IDENTITY(1,1) NOT NULL
    ,   Symbol varchar(10) NOT NULL
    ,   PriceDate datetime NOT NULL
    ,   PriceOpen numeric(18,2) NULL
    ,   PriceHigh numeric(18,2) NULL
    ,   PriceLow numeric(18,2) NULL
    ,   PriceClose numeric(18,2) NULL
    ,   Volume bigint NULL
    ,   AdjustmentClose numeric(18,2) NULL
    ,   CONSTRAINT [PK_TickerPriceHistory] PRIMARY KEY CLUSTERED ([Id] ASC)
);
GO

INSERT INTO dbo.TickerSymbols (Symbol, StartDate, EndDate, Resolution) VALUES
        ('AAPL', '2012-02-01', '2012-02-04', 'd')
    ,   ('GOOG', '2013-01-01', '2013-01-31', 'w')
    ,   ('MSFT', '2012-09-01', '2012-11-30', 'm')
    ,   ('YHOO', '2012-01-01', '2012-12-31', 'y')
    ;
GO

On the SSIS package, create the following variables.

  • EndDate: The package will use this variable of data type DateTime to hold the end date of the symbol being looped through in the record set list.

  • FileExtension: This variable of data type String will hold the file extension to use for the downloaded files. This is optional.

  • FileName: This variable of data type String will hold the name of the file for a given symbol. The name is generated based on timestamp to avoid overwriting previously downloaded files. Click the variable and press F4 to view properties. Change the property EvaluateAsExpression to True. Click on the Ellipsis button against Expression to open the Expression Builder. Set the Expression to the following value. This expression will evaluate to value like MSFT_20130210_092519.csv, where MSFT is the symbol and the rest of the information is package start time in the format yyyMMdd_hhmmss and .csv is the file extension.

@[User::Symbol] + "_" + (DT_WSTR, 4) YEAR(@[System::StartTime]) + RIGHT("00" + (DT_WSTR, 2) MONTH(@[System::StartTime]), 2) + RIGHT("00" + (DT_WSTR, 2) DAY(@[System::StartTime]), 2) + "_" + RIGHT("00" + (DT_WSTR, 2) DATEPART("hh", @[System::StartTime]), 2) + RIGHT("00" + (DT_WSTR, 2) DATEPART("mi", @[System::StartTime]), 2) + RIGHT("00" + (DT_WSTR, 2) DATEPART("ss", @[System::StartTime]), 2) + @[User::FileExtension]

  • FilePath: This variable of data type String will hold the complete path of the downloaded file for a given symbol. Click the variable and press F4 to view properties. Change the property EvaluateAsExpression to True. Click on the Ellipsis button against Expression to open the Expression Builder. Set the Expression to the value @[User::RootFolder] + "\\" + @[User::FileName]. We will use this express to

  • Resolution: The package will use this variable of data type String to hold the reolution information of the symbol being looped through in the record set list.

  • RootFolder: This variable of data type String will hold the root folder where the files should be downloaded to.

  • SQL_GetSymbols: This variable of data type String will contain the T-SQL query to fetch the ticker symbols information from database. Set the value to SELECT Symbol, StartDate, EndDate, Resolution FROM dbo.TickerSymbols

  • StartDate: The package will use this variable of data type DateTime to hold the start date of the symbol being looped through in the record set list.

  • Symbol: The package will use this variable of data type String to hold the ticker symbol as it loops through each record in the record set list.

  • SymbolsList: The package will use this variable of data type Object to hold the result set of ticker symbols stored in the database.

  • URLYahooChart: This variable of data type String will hold the URL to Yahoo Finance Chart website with place holders to fill in the appropriate values for query string. Set the value to http://ichart.finance.yahoo.com/table.csv?s={0}&a={1}&b={2}&c={3}&d={4}&e={5}&f={6}&g={7}&ignore=.csv

Variables

On the package, right-click on the Connection Managers tab and click Flat File Connection...

Flat File Connection Manager

On the General page of Flat File Connection Manager Editor, perform the following actions:

  • Set the Name to FILE_TickerPriceHistory

  • Set the Description to Read the ticker symbol price history.

  • If you already have a sample file, point to the file location. SSIS will infer the settings from the data in the file. In this case, I already downloaded a file by navigating the URL http://ichart.finance.yahoo.com/table.csv?s=MSFT&a=9&b=1&c=2012&d=11&e=30&f=2012&g=m&ignore=.csv and saved it under the name C:\Siva\StackOverflow\Files\14797886\Data\\MSFT_20130210_092519.csv

  • Make sure the Format is set to Delimited.

  • Make sure the Header row delimiter is set to {CR}{LF}

  • Check the box Column names in the first data row

  • Click Columns page

Flat File Connection Manager Editor - General

On the Columns page of Flat File Connection Manager Editor, make sure that the Row delimiter is set to {LF} and Column delimiter is set to Comma {,}. Click Advanced page.

Flat File Connection Manager Editor - Columns

On the Advanced page of Flat File Connection Manager Editor, the columns will be created based on the file information. Change the values as shown below so that column names match with names in the database. This way the column mapping will be easier. All columns except the last column should have the ColumnDelimiter set to Comma {,}. Lsst column should have the the ColumnDelimiter set to {LF}.

Column              Data type                            DataPrecision DataScale
------------------- ------------------------------------ ------------- ---------
PriceDate           date [DT_DATE]
PriceOpen           numeric [DT_NUMERIC]                      18          2
PriceHigh           numeric [DT_NUMERIC]                      18          2
PriceLow            numeric [DT_NUMERIC]                      18          2
PriceClose          numeric [DT_NUMERIC]                      18          2
Volume              eight-byte unsigned integer [DT_UI8]      
AdjustmentClose     numeric [DT_NUMERIC]                      18          2

Flat File Connection Manager Editor - Advanced

You should now see both the connection managers at the bottom of the package.

Connection Managers Created

Drag and drop an Execute SQL Task on to the Control Flow tab and perform the following actions on the General tab.

  • Set the Name to Get symbols from database
  • Set the Description to Fetch the list of symbols and its download settings from database.
  • Set the ResultSet to Full result set because the query will return a record set.
  • Set the ConnectionType to OLE DB
  • Set the Connection to OLEDB_Sora
  • Select Variable from SQLSourceType
  • Select User::SQL_GetSymbols from SourceVariable
  • Click Result Set page.

Execute SQL Task - General

On the Result Set page of Execute SQL Task, click Add and set Result Name to 0 indicating the index of the result set. Select User::SymbolsList from the Variable Name to store the result set into object variable.

Execute SQL Task - Result Set

Drag and drop a Foreach Loop Container and place it after the Execute SQL Task. Connect the Execute SQL Task green arrow to Foreach Loop Container. Double-click Foreach Loop Container to view Foreach Loop Editor. Configure the ForEach Loop Editor as shown below.

ForEach Loop Editor - Collection

On the Variable Mappings page of ForEach Loop Editor, configure it as shown below:

ForEach Loop Editor - Variable Mappings

Drag and drop a Script Task inside the ForEach Loop Container. Double-click the Script Task to open the Script Task Editor. On the Script page of the Script Task editor, click the Ellipsis button against the ReadOnlyVariables and select the below listed variables. We need to use these inside the Script Task code.

  • User::EndDate
  • User::FileExtension
  • User::FileName
  • User::FilePath
  • User::Resolution
  • User::RootFolder
  • User::StartDate
  • User::Symbol
  • User::URLYahooChart

Script Task Editor - Script

Click the Edit Script... button on the Script Task Editor and type the below code. After typing the code, close the Script Task Editor.

Script Task code in C#:

using System;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Net;

namespace ST_5fa66fe26d20480e8e3258a8fbd16683.csproj
{
    [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
    public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
    {
        #region VSTA generated code
        enum ScriptResults
        {
            Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
            Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
        };
        #endregion

        public void Main()
        {
            try
            {
                string symbol = Dts.Variables["User::Symbol"].Value.ToString();
                DateTime startDate = Convert.ToDateTime(Dts.Variables["User::StartDate"].Value);
                DateTime endDate = Convert.ToDateTime(Dts.Variables["User::EndDate"].Value);
                string resolution = Dts.Variables["User::Resolution"].Value.ToString();
                string urlYahooChart = Dts.Variables["User::URLYahooChart"].Value.ToString();

                string rootFolder = Dts.Variables["User::RootFolder"].Value.ToString();;
                string fileExtension = Dts.Variables["User::FileExtension"].Value.ToString();
                string fileName = Dts.Variables["User::FileName"].Value.ToString();
                string downloadPath = Dts.Variables["User::FilePath"].Value.ToString();

                if (!System.IO.Directory.Exists(rootFolder))
                    System.IO.Directory.CreateDirectory(rootFolder);

                urlYahooChart = string.Format(urlYahooChart
                                        , symbol
                                        , startDate.Month
                                        , startDate.Day
                                        , startDate.Year
                                        , endDate.Month
                                        , endDate.Day
                                        , endDate.Year
                                        , resolution);

                bool refire = false;
                Dts.Events.FireInformation(0, string.Format("Download URL of {0}", symbol), urlYahooChart, string.Empty, 0, ref refire);

                WebClient webClient = new WebClient();
                webClient.DownloadFile(urlYahooChart, downloadPath);

                Dts.TaskResult = (int)ScriptResults.Success;
            }
            catch (Exception ex)
            {
                Dts.Events.FireError(0, "Download error", ex.ToString(), string.Empty, 0);
            }
        }
    }
}

Drag and drop a Data Flow Task inside the Foreach Loop Container after the Script Task. Connect the green arrow from Script Task to the Data Flow Task. The Control Flow tab should look as shown below.

Control Flow Tab

On the Data Flow Task, drag and drop Flat File Source and configure it as shown below to read the price history CSV files.

Flat File Source - Connection Manager

Flat File Source - Columns

Drag and drop a Derived Column Transformation and create a new column named Symbol with the expression (DT_STR,10,1252)@[User::Symbol] to add the Symbol to the data pipeline.

Derived Column Transformation

Drag and drop OLE DB Destination and configure it as shown below to insert the data into the database.

OLE DB Destination - Connection Manager

OLE DB Destination - Mappings

Your Data Flow tab should look like as shown below:

Data Flow Tab

Before running the package, we need to make couple of changes to prevent any warnings or errors on the design time view due to the absence of the files in the folder.

Click the flat file connection manager FILE_TickerPriceHistory and press F4 to view the properties. Change the property DelayValidation to True. This will make sure that the validation of file existence will happen during runtime. Click the Ellipsis button against the Expression and set the ConnectionString property to the value @[User::FilePath]. This will change the file path as each file is being downloaded from the website.

Flat File Connection Manager - Delay Validation

Click the Data Flow Task and press F4 to view the properties. Change the property DelayValidation to True. This will make sure that the validation of file existence will happen during runtime.

Data Flow Task - DelayValidation

Navigate to the Data Flow tab and click the Flat File Source and press F4 to view the properties. Change the property ValidateExternalMetadata to False. This will make sure that the validation of flat file existence will happen during runtime.

Flat File Source - ValidateExternalMetadata

Let us navigate to the folder C:\Siva\StackOverflow\Files\14797886, where the downloaded files will be saved is empty. The folder does not have to be empty. This is just for execution check.

Download folder empty

Run the following SQL statements against the database to verify the data in the table. The second table should be empty.

SELECT * FROM dbo.TickerSymbols;
SELECT * FROM dbo.TickerPriceHistory;

Table data before execution

Execute the package. If everything is set up correctly, the package should run successfully and download the files for each symbol listed in table dbo.TickerSymbols

Control Flow Execution

Data Flow Execution

The files should be successfully saved to the folder C:\Siva\StackOverflow\Files\14797886. Notice that each file is named appropriately based on the expressions provided in the package.

Downloaded files

Run the following SQL statement against the database to verify the data in the table. The table dbo.TickerPriceHistory should now have the data from the price files downloaded from website.

SELECT * FROM dbo.TickerPriceHistory;

Table data after execution

The above sample package illustrated how to download price files from Yahoo Finance Chart website for a given list of ticker symbols and load them into the database.

Upvotes: 15

Related Questions