Max Boy
Max Boy

Reputation: 327

ColdFusion Query too slow

I have queries inside a cfloop that makes the process very slow. Is there a way to make this query faster?

<cfquery name="GetCheckRegister" datasource="myDB">
    SELECT * FROM CheckRegister, ExpenseType 
    Where PropertyID=10
    and ExpenseType.ExpenseTypeID=CheckRegister.ExpenseTypeID 
</cfquery>

<CFOUTPUT query=GetCheckRegister>
    <cfquery name="GetVendorName" datasource="myDB"> SELECT * FROM Vendors WHERE VendorID=#VendorID#</cfquery>
    <!--- I use the vendor name here --->

    <cfset local.CreditDate = "" />
  <cfquery name="getTenantTransactionDateFrom" dataSource="myDB">
    Select TenantTransactionDate as fromDate From TenantTransactions
    Where CheckRegisterID = #CheckRegisterID#
    Order By TenantTransactionDate Limit 1
  </cfquery>
  <cfquery name="getTenantTransactionDateTo" dataSource="myDB">
    Select TenantTransactionDate as ToDate From TenantTransactions
    Where CheckRegisterID = #CheckRegisterID#
    Order By TenantTransactionDate desc Limit 1
  </cfquery>
  <cfif getTenantTransactionDateFrom.fromDate neq "" AND getTenantTransactionDateTo.ToDate neq "">
    <cfif getTenantTransactionDateFrom.fromDate eq getTenantTransactionDateTo.ToDate>
      <cfset local.CreditDate = DateFormat(getTenantTransactionDateFrom.fromDate, 'mm/dd/yyyy') />
    <cfelse>
      <cfset local.CreditDate = DateFormat(getTenantTransactionDateFrom.fromDate, 'mm/dd/yyyy') & " - " & DateFormat(getTenantTransactionDateTo.ToDate, 'mm/dd/yyyy') />
    </cfif>
  </cfif>

  <!--- I use the local.CreditDate here  --->

  <!--- Here goes a table with the data --->
</CFOUTPUT>

cfoutput works like a loop.

Upvotes: 3

Views: 947

Answers (5)

Shawn
Shawn

Reputation: 4786

You should get all of your data in one query, then work with that data to output what you want. Multiple connections to a database almost always be more resource-intensive than getting the data in one trip and working with it. To get your results:

SQL Fiddle

Initial Schema Setup:

CREATE TABLE CheckRegister ( checkRegisterID int, PropertyID int, VendorID int, ExpenseTypeID int ) ;
CREATE TABLE ExpenseType ( ExpenseTypeID int ) ;
CREATE TABLE Vendors ( VendorID int ) ;
CREATE TABLE TenantTransactions ( checkRegisterID int, TenantTransactionDate date, note varchar(20) );

INSERT INTO CheckRegister ( checkRegisterID, PropertyID, VendorID, ExpenseTypeID )
VALUES (1,10,1,1),(1,10,1,1),(1,10,2,1),(1,10,1,2),(1,5,1,1),(2,10,1,1),(2,5,1,1) 
;

INSERT INTO ExpenseType ( ExpenseTypeID ) VALUES (1), (2) ;
INSERT INTO Vendors ( VendorID ) VALUES (1), (2) ;

INSERT INTO TenantTransactions ( checkRegisterID, TenantTransactionDate, note )
VALUES 
    (1,'2018-01-01','start')
  , (1,'2018-01-02','another')
  , (1,'2018-01-03','another')
  , (1,'2018-01-04','stop')
  , (2,'2017-01-01','start')
  , (2,'2017-01-02','another')
  , (2,'2017-01-03','another')
  , (2,'2017-01-04','stop')
 ;

Main Query:

SELECT cr.*
  , max(tt.TenantTransactionDate) AS startDate
  , min(tt.TenantTransactionDate) AS endDate
FROM CheckRegister cr 
INNER JOIN ExpenseType et ON cr.ExpenseTypeID = et.ExpenseTypeID
INNER JOIN Vendors v ON cr.vendorID = v.VendorID 
LEFT OUTER JOIN TenantTransactions tt ON cr.checkRegisterID = tt.CheckRegisterID
WHERE cr.PropertyID = 10
GROUP BY cr.CheckRegisterID, cr.PropertyID, cr.VendorID, cr.ExpenseTypeID

Results:

| checkRegisterID | PropertyID | VendorID | ExpenseTypeID |  startDate |    endDate |
|-----------------|------------|----------|---------------|------------|------------|
|               1 |         10 |        1 |             1 | 2018-01-04 | 2018-01-01 |
|               1 |         10 |        1 |             2 | 2018-01-04 | 2018-01-01 |
|               1 |         10 |        2 |             1 | 2018-01-04 | 2018-01-01 |
|               2 |         10 |        1 |             1 | 2017-01-04 | 2017-01-01 |

I only added 2 check registers, but CheckRegisterID 1 has 2 vendors and 2 Expense Types for Vendor 1. This will look like repeated data in your query. If your data isn't set up that way, you won't have to worry about it in the final query.

Use proper JOIN syntax to get the related data you need. Then you can aggregate that data to get the fromDate and toDate. If your data is more complex, you may need to look at Window Functions. https://dev.mysql.com/doc/refman/8.0/en/window-functions.html

I don't know what your final output looks like, but the above query gives you all of the query data in one pass. And each row of that data should give you what you need to output, so now you've only got one query to loop over.

Upvotes: 5

SOS
SOS

Reputation: 6560

As others have said, you should get rid of the loop and use joins. Looking at your inner loop, the code retrieves the earliest and latest date for each CheckRegisterID. Instead of using LIMIT, use aggregate functions like MIN and MAX and GROUP BY CheckRegisterID. Then wrap that result in a derived query so you can join the results back to CheckRegister ON id.

Some of the columns in the original query aren't scoped, so I took a few guesses. There's room for improvement, but something like is enough to get you started.

-- select only needed columns
SELECT cr.CheckRegisterID, ... other columns
FROM CheckRegister cr 
       INNER JOIN ExpenseType ex ON ex.ExpenseTypeID=cr.ExpenseTypeID 
       INNER JOIN Vendors v ON v.VendorID = cr.VendorID
       LEFT JOIN 
       (
            SELECT CheckRegisterID
              , MIN(TenantTransactionDate) AS MinDate
              , MAX(TenantTransactionDate) AS MaxDate
            FROM  TenantTransactions
            GROUP BY CheckRegisterID
       ) tt ON tt.CheckRegisterID = cr.CheckRegisterID

WHERE cr.PropertyID = 10

I'd highly recommend reading up on JOIN's as they're critical to any web application, IMO.

Upvotes: 6

Alex
Alex

Reputation: 7833

You always want to avoid having queries in a loop. Whenever you query the database, you have roundtrip (from server to database and back from database to server) which is slow by nature.

A general approach is to bulk data by querying all required information with as few statements as possible. Joining everything in a single statement would be ideal, but this obviously depends on your table schemes. If you cannot solve it using SQL only, you can transform your queries like this:

GetCheckRegister...

(no loop)

<cfquery name="GetVendorName" datasource="rent">
    SELECT * FROM Vendors WHERE VendorID IN (#valueList(GetCheckRegister.VendorID)#)
</cfquery>

<cfquery name="getTenantTransactionDateFrom" dataSource="rent">
    Select TenantTransactionDate as fromDate From TenantTransactions
    Where CheckRegisterID IN (#valueList(GetCheckRegister.CheckRegisterID)#)
</cfquery>

etc.

valueList(query.column) returns a comma delimited list of the specified column values. This list is then used with MySQL's IN (list) selector to retrieve all records that belong to all the listed values.

Now you would only have a single query for each statement in your loop (4 queries total, instead of 4 times number of records in GetCheckRegister). But all records are clumped together, so you need to match them accordingly. To do this we can utilize ColdFusion's Query of Queries (QoQ), which allows you to query on already retrieved data. Since the retrieved data is in memory, accessing them is quick.

GetCheckRegister, GetVendorName, getTenantTransactionDateFrom, getTenantTransactionDateTo etc.

<CFOUTPUT query="GetCheckRegister">

    <!--- query of queries --->
    <cfquery name="GetVendorNameSingle" dbType="query">
        SELECT * FROM [GetVendorName] WHERE VendorID = #GetCheckRegister.VendorID#
    </cfquery>

    etc.

</CFOUTPUT>

You basically moved the real queries out of the loop and instead query the result of the real queries in your loop using QoQ.

Regardless of this, make sure your real queries are fast by profiling them in MySQL. Use indices!

Upvotes: 2

F.Igor
F.Igor

Reputation: 4410

Using the main query and the loop to process the data could be faster if:

  • Using SELECT with only specific fields you need, to avoid fetching so many columns (instead of SELECT *), unless you are using all the fields:

SELECT VendorID, CheckRegisterId, ... FROM CheckRegister, ExpenseType ...

  • Using less subqueries in the loop, trying to join the table to the main query. For example, using the Vendors table in the main query (if it could be posible to join this table)

SELECT VendorID, CheckRegisterId, VendorName ... FROM CheckRegister, ExpenseType, Vendors ...

  • Finally, you can estimate the time of the process and detect the performance problem:
    • ROWS = Number of rows of the result in the main query
    • TIME_V = Time (ms) to get the result of GetVendorName using a valid VendorId
    • TIME_TD1 = Time (ms) to get the result of getTenantTransactionDateFrom using a valid CheckRegisterID
    • TIME_TD2 = Time (ms) to get the result of getTenantTransactionDateTo using a valid CheckRegisterID
  • Then, you can calculate the resulting time using TOTAL = ROWS * (TIME_V+ TIME_TD1 + TIME_TD2).
  • For example, if ROWS=10000 , TIME_V = 30, TIME_TD1 = 15, TIME_TD2 = 15 : RESULT = 10000 * (30 + 15 + 15) = 10000 * 60 = 600000 (ms) = 600 (sec) = 10 min

So, for 10000 rows, one milisecond of the loop results in 10 seconds added to the process.

When you have many resulting rows for the main query, you need to minimize the query time of each element in the loop. Each milisecond affects in the performance of the loop. So you need to make sure there are the right indexes for each field filtered for each query in the loop.

Upvotes: 0

hideki
hideki

Reputation: 1340

It has been a long time since I did any ColdFusion development but a common rule of thumb would be to not call queries within a loop. Depending on what you are doing, loops can be considered an RBAR (row by agonizing row) operation.

You are essentially defining one query and looping over each record. For each record, you are doing three additional queries aka three additional database network calls per record. The way I see it, you have a couple options:

  1. Rewrite your first query to already include the data you need within each record check.
  2. Leave your first query the way it is and create functionality that provides more information when the user interacts with the record and do it asynchronously. Something like a "Show Credit Date" link which goes out and gets the data on demand.
  3. Combine the queries in your loop to be one query instead of the two getTenantTransaction... and see if performance improves. This reduces the RBAR database calls from three to two.

Upvotes: 3

Related Questions