Reputation: 1269
I have a database with 4 tables filled with millions of rows. I have my program run on several computers computing data and then returning it to the database. The huge bottleneck in my program design is that for each calculation, it has to download the data and then perform calculations on it and then save the results to the database. When I had the data on the local network it performed with crazy speed so I realize that the resources to download the data from a remote server is the problem.
What are some ways I can save data from the remote database either before or after my code runs so I can make my program more efficient. These calculations are done once and aren't needed again and I have 24 computers running this same program.
static void Main(string[] args)
{
try
{
List<StockData> stockData = new List<StockData>();
List<StockMarketCompare> stockCompareData = new List<StockMarketCompare>();
List<StockData> sandpInfo = new List<StockData>();
List<StockData> sandpDateInfo = new List<StockData>();
List<StockData> globalList = new List<StockData>();
List<StockData> amexList = new List<StockData>();
List<StockData> nasdaqList = new List<StockData>();
List<StockData> nyseList = new List<StockData>();
List<DateTime> completedDates = new List<DateTime>();
SymbolInfo symbolClass = new SymbolInfo();
bool isGoodToGo = false;
string symbol, market;
int activeSymbolsCount = 0;
int rowCount = 0, completedRowCount = 0;
DateTime date = new DateTime();
DateTime searchDate = new DateTime();
// get the data here
using (StockRatingsTableAdapter stockRatingsAdapter = new StockRatingsTableAdapter())
using (OoplesDataSet.StockRatingsDataTable stockRatingsTable = new OoplesDataSet.StockRatingsDataTable())
using (SymbolsTableAdapter symbolAdapter = new SymbolsTableAdapter())
using (OoplesDataSet.SymbolsDataTable symbolTable = new OoplesDataSet.SymbolsDataTable())
using (DailyAmexDataTableAdapter dailyAmexAdapter = new DailyAmexDataTableAdapter())
using (OoplesDataSet.DailyAmexDataDataTable dailyAmexTable = new OoplesDataSet.DailyAmexDataDataTable())
using (OoplesDataSet.OldStockRatingsDataTable historicalRatingsTable = new OoplesDataSet.OldStockRatingsDataTable())
using (OldStockRatingsTableAdapter historicalRatingsAdapter = new OldStockRatingsTableAdapter())
using (OoplesDataSet.OldStockRatingsDataTable historicalRatingSymbolTable = new OoplesDataSet.OldStockRatingsDataTable())
using (OldStockRatingsTableAdapter historicalRatingSymbolAdapter = new OldStockRatingsTableAdapter())
using (OoplesDataSet.DailyGlobalDataDataTable sandp500Table = new OoplesDataSet.DailyGlobalDataDataTable())
using (OoplesDataSet.CurrentSymbolsDataTable currentSymbolTable = new OoplesDataSet.CurrentSymbolsDataTable())
using (CurrentSymbolsTableAdapter currentSymbolAdapter = new CurrentSymbolsTableAdapter())
{
// fill the s&p500 info first
dailyGlobalAdapter.ClearBeforeFill = true;
dailyGlobalAdapter.FillBySymbol(sandp500Table, Calculations.sp500);
var sandpQuery = from c in sandp500Table
select new StockData { Close = c.Close, Date = c.Date, High = c.High, Low = c.Low, Volume = c.Volume };
sandpInfo = sandpQuery.AsParallel().ToList();
// set the settings for the historical ratings adapter
historicalRatingsAdapter.ClearBeforeFill = true;
// fill the stock ratings info
stockRatingsAdapter.Fill(stockRatingsTable);
// get all symbols in the stock ratings table
var symbolsAmountQuery = from c in stockRatingsTable
select new SymbolMarket { Symbol = c.Symbol, Market = c.Market };
List<SymbolMarket> ratingSymbols = symbolsAmountQuery.AsParallel().ToList();
if (ratingSymbols != null)
{
activeSymbolsCount = ratingSymbols.AsParallel().Count();
}
for (int i = 0; i < activeSymbolsCount; i++)
{
symbol = ratingSymbols.AsParallel().ElementAtOrDefault(i).Symbol;
market = ratingSymbols.AsParallel().ElementAtOrDefault(i).Market;
dailyAmexAdapter.FillBySymbol(dailyAmexTable, symbol);
historicalRatingSymbolAdapter.FillBySymbolMarket(historicalRatingSymbolTable, market, symbol);
if (dailyAmexTable != null)
{
var amexFillQuery = from c in dailyAmexTable
select new StockData { Close = c.Close, Date = c.Date, High = c.High, Low = c.Low, Volume = c.Volume };
amexList = amexFillQuery.AsParallel().ToList();
rowCount = amexList.AsParallel().Count();
}
if (historicalRatingSymbolTable != null)
{
completedRowCount = historicalRatingSymbolTable.AsParallel().Count();
completedDates = historicalRatingSymbolTable.AsParallel().Select(d => d.Date).ToList();
}
currentSymbolAdapter.Fill(currentSymbolTable);
var currentSymbolQuery = from c in currentSymbolTable
where c.Symbol == symbol && c.Market == market
select c;
List<OoplesDataSet.CurrentSymbolsRow> currentSymbolRow = currentSymbolQuery.AsParallel().ToList();
// if the rows don't match up and if no other computer is working on the same symbol
if (rowCount - 30 != completedRowCount && currentSymbolRow.Count == 0)
{
// update the table to let the other computers know that we are working on this symbol
var computerQuery = from c in currentSymbolTable
where c.ComputerName == Environment.MachineName
select c;
List<OoplesDataSet.CurrentSymbolsRow> currentComputerRow = computerQuery.AsParallel().ToList();
if (currentComputerRow.Count > 0)
{
// update
currentComputerRow.AsParallel().ElementAtOrDefault(0).Symbol = symbol;
currentComputerRow.AsParallel().ElementAtOrDefault(0).Market = market;
OoplesDataSet.CurrentSymbolsDataTable tempCurrentTable = new OoplesDataSet.CurrentSymbolsDataTable();
tempCurrentTable = (OoplesDataSet.CurrentSymbolsDataTable)currentSymbolTable.GetChanges();
if (tempCurrentTable != null)
{
currentSymbolAdapter.Adapter.UpdateCommand.UpdatedRowSource = System.Data.UpdateRowSource.None;
currentSymbolAdapter.Update(tempCurrentTable);
tempCurrentTable.AcceptChanges();
tempCurrentTable.Dispose();
Console.WriteLine(Environment.MachineName + " has claimed dominion over " + symbol + " in the " + market + " market!");
}
}
else
{
// insert
currentSymbolAdapter.Insert(symbol, market, Environment.MachineName);
Console.WriteLine(Environment.MachineName + " has claimed dominion over " + symbol + " in the " + market + " market!");
}
Parallel.For(0, rowCount - 30, new ParallelOptions
{
MaxDegreeOfParallelism = Environment.ProcessorCount
}, j =>
{
if (amexList.AsParallel().Count() > 0)
{
date = amexList.AsParallel().ElementAtOrDefault(j).Date;
searchDate = date.Subtract(TimeSpan.FromDays(60));
if (completedDates.Contains(date) == false)
{
var amexQuery = from c in sandpInfo
where c.Date >= searchDate && c.Date <= date
join d in amexList on c.Date equals d.Date
select new StockMarketCompare { stockClose = d.Close, marketClose = c.Close };
var amexStockDataQuery = from c in amexList
where c.Date >= searchDate && c.Date <= date
select new StockData { Close = c.Close, High = c.High, Low = c.Low, Volume = c.Volume, Date = c.Date };
stockCompareData = amexQuery.AsParallel().ToList();
stockData = amexStockDataQuery.AsParallel().ToList();
isGoodToGo = true;
}
else
{
isGoodToGo = false;
}
}
if (completedDates.Contains(date) == false)
{
var sandpDateQuery = from c in sandpInfo
where c.Date >= searchDate && c.Date <= date
select c;
sandpDateInfo = sandpDateQuery.AsParallel().ToList();
symbolClass = new SymbolInfo(symbol, market);
isGoodToGo = true;
}
else
{
isGoodToGo = false;
}
if (isGoodToGo)
{
sendMessage(sandpInfo, date, symbolClass, stockData, stockCompareData);
}
});
}
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
Console.WriteLine(ex.StackTrace);
}
}
Upvotes: 0
Views: 56
Reputation: 2786
What you seem to be doing is overkill and will not bring you that far.
On a first glance I spot couple of lines that I suspect of the N+1 syndrome.
The massive use of AsParallel to create a list or even count won't bring any benefits either.
What worries me most however, you're talking about 4 tables but I count 13 Adapters?
Almost obvious that you are doing all the work client side.
Instead of blindly filling datasets with full table contents and then filter on the result: craft a queries with the data you need without omitting a WHERE clause. Now all your [24 computers] are plowing through the same mess.
And as mentioned in the comments above, you'll be amazed how much you can do with T-SQL.
Preprocess that data server-side; merge, join & filter and perhaps aggregate the results into a 5-th (temp) table.
Let you other computers query against those results with partioning in mind so they all take about 1/24th part of the work.
That is not using AsParallel but being Parallel. Way more efficient, less trouble some and way clearer
Bottom line: Redesign :-)
Upvotes: 1