user3043997
user3043997

Reputation: 27

Missing the obvious with inconsistently delimited data?

I have built something in SAS to pull down Yahoo! finance .csv data. The code I have built now works fine and I have built some robust error handling into the code. The problem I have had with the data though is that the .csv feed is unsupported and not clean.

The data is comma delimited, but some of the data also has commas in it. Some of the fields are in quotes and some are not. Also the length of the fields varies wildly as as well. A field like Market Capitlisation for example could run form a few million to hundreds of billions.

As a result, if you pass multiple stock metrics for multiple stocks through to the Yahoo! API at the same time, you will get rows of .csv data where each field is in a different place, is a different length and is inconsistently delimited.

I have tried multiple infile options that could handle some of these errors in isolation, but not all of them together. My only solution that works is to download single stock metrics by multiple stocks at the same time.

This gives me what I want, but it takes over an hour to run the data for the NASDAQ and the NYSE. Have I overlooked another method for handling this type of problem?

Thanks

Upvotes: 0

Views: 118

Answers (1)

DomPazz
DomPazz

Reputation: 12465

This is the outline of a way to do what you are looking for. The whole of the code to do this would be too long to post here and out of scope of what this site looks to do.

  1. Create a SAS program that takes a stock ticker from the SYSPARM automatic macro, and downloads the data to a data set named the same as the ticker into a permanent library.

    The SYSPARM macro is set by the value you set on the commandline to call SAS

    sas.exe myprog.sas -sysparm XYZ

    This would set &SYSPARM to resolve XYZ

  2. Write a SAS program that merges all the ticker data sets together for further processing.

  3. Create a program in a language like Perl or Python, (or shell script, etc.) that loops over a range of tickers and calls your SAS program, passing the ticker through SYSPARM.

  4. Use a threading, forking, etc. package from that language to have multiple of these running at the same time. You can probably go to some multiple of the CPU cores on your machine as this processing will not be CPU intensive. Test values to you find one that works.

  5. From that same language call your SAS program to merge the datasets.

Upvotes: 1

Related Questions