mediaeval
mediaeval

Reputation: 143

FoxPro 6.0 error message - temp file is too large

Here is my FoxPro program. The Select clause is causing an error.

PROCEDURE IFRS_Split_Cashflows_2_v7_4_CSV
PARAMETERS model_dir, cube2_output_path, output_path, scenario, version
temp_dir = GETENV("TEMP")

**** Load up utility function and Create an output folder ****
SET PROCEDURE TO model_dir + "\CUBE2_" + version + "\PRGS\Utilities_" + version + ".prg"
lcTextFile = cube2_output_path + "\IFRS_Split_Cashflows_2\run_" + scenario + ".csv"
Do delete_file WITH lcTextFile
Do create_folder WITH cube2_output_path + "\IFRS_Split_Cashflows_2"

**** Assign variables to the input files ****
moses_output_file = output_path + "\annuities~main10.dbf"
SELECT 0
Use (moses_output_file) ALIAS cube_output1
moses_output_file = output_path + "\annuities~main10~reinsured.dbf"
SELECT 0
Use (moses_output_file) ALIAS cube_output2
moses_output_file = output_path + "\annuities~main10~reinsurance_premium.dbf"
SELECT 0
Use (moses_output_file) ALIAS cube_output3

**** Split the group string and select relevant data ****
SELECT ;
      c1.t_from AS t_from, ;
      c1.cal_mth as calendar_month, ;
      c1.cal_yr as calendar_year, ;
      substr(c1.group,1,3) AS BUS_LINE, ;
      padr( IIF( isnumeric( strextract(c1.group, "|", "|", 1, 1), ""), ;
                 substr(c1.group,1,4) + strextract(c1.group, "|", "|", 1, 1), ;
                 strextract( c1.group,"|", "|",1, 1)), 10) AS reinsurance_book,;
      padr( strextract(c1.group, "|","|", 3,1),10) AS Esc_Index,;
      padr( IIF( left(c1.group,3) = "LRT", ;
                 strextract(c1.group, "|", "|",5,1), ;
                 strextract(c1.group,"|","|",4,1)) ,10) AS MAP_ELIG,;
      padr( IIF( left(c1.group,3) = "LRT", ;
                 strextract( trim(c1.group - "|"),"|","|", 6, 1), ;
                 strextract( trim(c1.group - "|"),"|","|", 5,1)), 10) AS MAP_STAT,;
      SUM(c1.ann_ben_if) AS ann_ben_if, ;
      SUM(c2.dth_out) AS c2death_outgo, ;
      SUM(c3.dth_out) AS c3death_outgo;
   FROM ;
      cube_output1 c1, ;
      cube_output2 c2, ;
      cube_output3 c3; 
   INTO ;
      CURSOR temp READWRITE;
   WHERE ;
          c1.t_from = 0 ;
      AND (   ( c1.t_from = c2.t_from AND c1.group = c2.group) ;
           OR ( c1.t_from = c3.t_from AND c1.group = c3.group)) ;
   group by ;
      c1.t_from, ;
      c1.cal_mth, ;
      c1.cal_yr, ;
      BUS_LINE, ;
      reinsurance_book, ;
      Esc_Index, ;
      MAP_ELIG, ;
      MAP_STAT 


**** Copy to a CSV file ****
COPY TO cube2_output_path + "\IFRS_Split_Cashflows_2\run_" + scenario + ".csv" TYPE CSV

It generates an error, "File c:\users\jc\appdata\local\temp\00002jo9000h.tmp is too large. Other task failed."

I actually removed several fields from the select statement, pertaining to c2 and c3. I also changed the first part of the where clause from c1.t_from > 0 to c1.t_from = 0, to cut down on observations.

The error still appears. I understand that the upper limit file size for a temporary file is 2GB. Size-wise, The input files are as follows: c1 is 275MB, C2 is 275MB and c3 is 191MB. Given that, I'm surprised that the temporary file is too large. Why is it too large? And is there any way that I can improve on my Select statement, above, to correctly return the requested fields?

Upvotes: 1

Views: 1086

Answers (2)

DRapp
DRapp

Reputation: 48139

I think you are getting a Cartesian result in your query which is killing you.

Your query is not using any actual column values from your C2 and C3 alias references, just ensuring that the given T_From and Group are found in EITHER of the C2 or C3 tables.

To eliminate your temp file crash, I would ensure you have an index on each of your C2 and C3 tables based on T_From and Group such as

moses_output_file = output_path + "\annuities~main10~reinsured.dbf"
Use (moses_output_file) ALIAS cube_output2 EXCLUSIVE
index on str( T_From ) + Group tag TFromGroup

moses_output_file = output_path + "\annuities~main10~reinsurance_premium.dbf"
SELECT 0
Use (moses_output_file) ALIAS cube_output3 EXCLUSIVE
index on str( T_From ) + Group tag TFromGroup

The indexes can be created ONCE outside this function and will always remain available. Then the index can be utilized here if it does not already exist.

Now, the query fix. Change your FROM/WHERE clause to the following. What I am doing is a SQL-JOIN from your C1 table to the result of a sub-query. Since you only care to pull out records that have a record in EITHER of the other tables, I am doing a PreCheck query to get distinct T_From / Group records found. Then only pulling those C1 records with T_From = 0.

Notice the PreCheck query is also applying a where clause on the T_From = 0, so I don't I can just join on those groups where it WAS found with T_From = 0. You will completely eliminate any possible Cartesian result as the inner query will only produce DISTINCT "Group" records.

select (all other fields the same... last field add comma)

    C2Sum.C2Death_Outgo,;
    C3Sum.C3Death_Outgo;
FROM ;
    cube_output1 c1;
        LEFT JOIN ( select c2.Group,;
                           SUM(c2.dth_out) AS c2death_outgo ;
                       FROM cube_output2 c2;
                       WHERE c2.t_from = 0;
                       GROUP BY c2.Group ) C2Sum ;
           on c1.t_group = C2Sum.Group; ;
        LEFT JOIN ( select c3.Group, ;
                           SUM(c3.dth_out) AS c3death_outgo;
                       FROM cube_output3 c3 ;
                       WHERE c3.t_from = 0 ;
                       GROUP BY c3.Group ) C3Sum ;
        ON c1.t_group = C3Sum.group;
WHERE ;
        c1.t_from = 0 ;
    AND ( NOT ISNULL( c2Sum.Group )  OR NOT ISNULL( c3Sum.Group )) 

What I am doing here is a group by for each individual table to prevent a Cartesian between them both. I am doing a LEFT-JOIN to both of them because not knowing your data, there MIGHT be a record in each C2 / C3 table and don't want to skew the summations.

By finishing the where clause, I am looking for EITHER LEFT JOIN to NOT BE NULL, meaning there IS a record for the given group. Again, EITHER side or BOTH could have a record so it would be counted ONCE for the join and not result in duplicates.

Now, since the group by's amounts are only at the group level, and your OUTER query is grouping by more parts of the bus line, insurance book, index, etc, no matter which sub-classification they are in, the GROUP amount will show the same for all, which MIGHT NOT be what you are looking for.

Finally, your heavy parsing, padding, string extract appear to NOT be a great way of categorizing things, but might be more efficient with individual columns instead... just a thought.

Upvotes: 2

Cetin Basoz
Cetin Basoz

Reputation: 23797

As Drapp said, you are getting a cartesian (not really but close to it) result and thus your temp file is getting huge. If you make a little model of it:

cube1

t_from cal_mth cal_yr ann_be_if group
------ ------- ------ --------- -----
  0      1      2016   100        1
  0      2      2016   800        2
  1      3      2016   500        3

cube2

t_from fieldX group
------ ------ -----
  0      1      1
  0      2      1
  0      3      1

cube3

t_from fieldY group
------ ------ -----
  0      1      2
  0      2      2
  0      3      2
  0      4      2

Using your SQL, this would lead to selection of 2 rows from cube1 3*4 times (your criteria matches to 3 rows in cube2 and 4 rows in cube3) and that would be 24 lines. Since you are repeating selection of the same rows 3*4 times you are also repeating the values of ann_be_if 3*4 times and summing them up would be (800+100)*12 = 10800. That is unlikely what you want.

You shouldn't do aggregation when you have joins (either implicit or explicit) with other tables (unless you are sure you wouldn't have this duplication - i.e. joining sample customer, orders,orditems tables wouldn't exhibit this duplication).

Reading your SQL again, those 2nd and 3rd tables have no roles on the result except that you are only doing an "exists" check on them. So you could rewrite your code as:

**** Assign variables to the input files ****
*!* moses_output_file = output_path + "\annuities~main10.dbf"
*!* SELECT 0
*!* Use (moses_output_file) ALIAS cube_output1
*!* moses_output_file = output_path + "\annuities~main10~reinsured.dbf"
*!* SELECT 0
*!* Use (moses_output_file) ALIAS cube_output2
*!* moses_output_file = output_path + "\annuities~main10~reinsurance_premium.dbf"
*!* SELECT 0
*!* Use (moses_output_file) ALIAS cube_output3

cube_output1 = m.output_path + "\annuities~main10.dbf"
cube_output2 = m.output_path + "\annuities~main10~reinsured.dbf"
cube_output3 = m.output_path + "\annuities~main10~reinsurance_premium.dbf"

Select t_from, ;
    cal_mth As calendar_month, ;
    cal_yr As calendar_year, ;
    substr(c1.Group,1,3) As BUS_LINE, ;
    padr(Iif(isnumeric(Strextract(c1.Group,"|","|",1,1),""), ;
        substr(c1.Group,1,4) + Strextract(c1.Group,"|","|",1,1), ;
        strextract(c1.Group,"|","|",1,1)),10) As reinsurance_book, ;
    padr(Strextract(c1.Group,"|","|",3,1),10) As Esc_Index, ;
    padr(Iif(Left(c1.Group,3)="LRT",;
        strextract(c1.Group,"|","|",5,1),;
        strextract(c1.Group,"|","|",4,1)),10) As MAP_ELIG, ;
    padr(Iif(Left(c1.Group,3)="LRT",;
    strextract(Trim(c1.Group - "|"),"|","|",6,1), ;
    strextract(Trim(c1.Group - "|"),"|","|",5,1)),10) As MAP_STAT, ;
    SUM(ann_ben_if) As ann_ben_if ;
    FROM (m.cube_output1) c1 ;
    WHERE c1.t_from = 0 And ;
    exists ( Select * From (m.cube_output2) c2 ;
             Where c2.t_from = 0 And c1.Group = c2.Group ) ;
    OR ;
    exists ( Select * From (m.cube_output3) c3 ;
             Where c3.t_from = 0 And c1.Group = c3.Group ) ;
    group By t_from, calendar_month, calendar_year, ;
        BUS_LINE, reinsurance_book, Esc_Index, MAP_ELIG, MAP_STAT ;
    Into Cursor temp ;
    Readwrite


**** Copy to a CSV file ****
COPY TO (m.cube2_output_path + "\IFRS_Split_Cashflows_2\run_" + m.scenario + ".csv") TYPE CSV

Upvotes: 0

Related Questions