user125264
user125264

Reputation: 1827

best approach using cfthread

im looking at processing a query within a cfthread as the original data could be thousands of requests. Im looking at using cfthread so i can make it process in the background. I know there is alot of info on cfthread, but getting my head around it has been difficult.

what this all comes down to is a number of remote calls using cfhttp and whether they are in a single thread or should be on their own.

The remote calls could take between 5 - 10 seconds each, and the database updates are very minor, simply updating a true false value on the query processing them.

<cfthread action="run" name="myThreadName" priority="high">
    <!--- do a query --->
    <cfloop query="myQuery">
        <!--- do a remote call --->
        <!--- process remote call response --->
        <!--- update local dbtables to indicate process is complete --->
        <!--- sleep using <cfset sleep(5000)> --->
    </cfloop>
</cfthread>

Or is this the more ideal basic use of cfthread for this basic process

<cfloop from="1" to="1000" index="idx">
    <cfthread action="run" name="myThreadName" priority="high">
        <!--- do a query --->
        <!--- do a remote call --->
        <!--- process remote call response --->
        <!--- update local dbtables to indicate process is complete --->
        <!--- sleep using <cfset sleep(5000)> --->
    </cfthread>
</cfloop>

Im trying to find the best balance so i dont crash my servers, but am also able to handle alot of these requests to fetch information from the external service, but im struggling to find which is the best direction, or if there is a better process all together to handle the remote requests on mass

thanks in advance

Upvotes: 1

Views: 439

Answers (1)

Alex
Alex

Reputation: 7833

You can't just spawn n threads and hope for the best. Too many threads will cause the CPU scheduler to switch context way too often and thus cause an overall slowdown. I also don't see why you would want the thread(s) to run in high(er) priority, given that there are other things on the machine that need CPU time. It would simply cause delay on other threads while the many high priority threads would still contest for CPU time among each other. There's also no need for a thread pause (sleep) then.

You need to make the best out of the two approaches. Consider something like this:

<cfset numberOfThreads              = 8>
<cfset numberOfRemoteCallsPerThread = 4> 

<cfloop from="1" to="#numberOfThreads#" index="threadIndex">

    <!--- do a query and split the number of records to process by dividing/offsetting --->

    <cfthread name="myThread_#threadIndex#">
        <!--- do a remote call --->
        <!--- process remote call response --->
        <!--- update local dbtables to indicate process is complete --->
    </cfthread>

</cfloop>

This will parallel your processing efficiently. Each thread will process a pre-determined set of records. To split the records, you could do something along the lines of:

SELECT
    <data to send>

WHERE
    <filter records to process>

LIMIT
    #((threadIndex - 1) * numberOfRemoteCallsPerThread)#, #numberOfRemoteCallsPerThread#

Whatever works best here depends on how you acquire the records that need processing. You might need to calculate the records before starting to loop etc.

Upvotes: 1

Related Questions