Max Williams
Max Williams

Reputation: 821

Threading for Performance Improvement

I have never used threads--never thought my code would benefit. However, I think threading might improve performance of the following pseudo code:

Loop through table of records containing security symbol field and a quote field
    Load a web page (containing a security quote for a symbol) into a string variable
    Parse the string for the quote
    Save the quote in the table
    Get next record
end loop

Loading each web page takes the most time. Parsing for the quote is quite fast. I guess I could take, say, half the records for one thread and work the other half in a second thread.

Upvotes: 6

Views: 310

Answers (2)

gabr
gabr

Reputation: 26830

In OmniThreadLibrary it is very simple to solve this problem with a multistage pipeline - first stage runs on multiple tasks and downloads web pages and second stage runs in one instance and stores data into the database. I have written a blog post documenting this solution some time ago.

The solution can be summed up with the following code (you would have to fill in some places in HttpGet and Inserter methods).

uses
  OtlCommon,
  OtlCollections,
  OtlParallel;

function HttpGet(url: string; var page: string): boolean;
begin
  // retrieve page contents from the url; return False if page is not accessible
end;

procedure Retriever(const input: TOmniValue; var output: TOmniValue);
var
  pageContents: string;
begin
  if HttpGet(input.AsString, pageContents) then
    output := TPage.Create(input.AsString, pageContents);
end;

procedure Inserter(const input, output: IOmniBlockingCollection);
var
  page   : TOmniValue;
  pageObj: TPage;
begin
  // connect to database
  for page in input do begin
    pageObj := TPage(page.AsObject);
    // insert pageObj into database
    FreeAndNil(pageObj);
  end;
  // close database connection
end;

procedure ParallelWebRetriever;
var
  pipeline: IOmniPipeline;
  s       : string;
  urlList : TStringList;
begin
  // set up pipeline
  pipeline := Parallel.Pipeline
    .Stage(Retriever).NumTasks(Environment.Process.Affinity.Count * 2)
    .Stage(Inserter)
    .Run;
  // insert URLs to be retrieved
  for s in urlList do
    pipeline.Input.Add(s);
  pipeline.Input.CompleteAdding;
  // wait for pipeline to complete
  pipeline.WaitFor(INFINITE);
end;

Upvotes: 4

Remy Lebeau
Remy Lebeau

Reputation: 596297

If the number of records is relatively small, say 50 or less, you could just launch a separate thread for each record and let them all run in parallel, eg:

begin thread
  Load a web page for symbol into a string variable
  Parse the string for the quote
  Save the quote in the table
end thread

.

Loop through table of records
  Launch a thread for current security symbol
  Get next record
end loop

If you have a larger number of records to process, consider using a pool of threads so you can handle records in smaller batches, eg:

Create X threads
Put threads in a list

Loop through table of records
  Wait until a thread in pool is idle
  Get idle thread from pool
  Assign current security symbol to thread
  Signal thread
  Get next record
end loop

Wait for all threads to be idle
Terminate threads

.

begin thread
  Loop until terminated
    Mark idle
    Wait for signal
    If not Terminated
      Load a web page for current symbol into a string variable
      Parse the string for the quote
      Save the quote in the table
    end if
  end loop
end thread

There are many different ways you could implement the above, which is why I left it in pseudocode. Look at the VCL's TThread, TList, and TEvent classes, or the Win32 API QueueUserWorkerItem() function, or any number of third party threading libraries.

Upvotes: 4

Related Questions