lila
lila

Reputation: 13

Data frame as a resulting output and summarize them in R

I have a main data frame with 5 columns (variables) and 1000 rows.

The desired output for each row entry is another data frame (with 10 columns and 100 rows each) generated/computed based on the variables from each row of the main data frame.

Also, the final result should be the sum of all resulting data frames.

I'm requesting for R codes.

Example:

Main data frame:

     |  Var 1   |   Var2   |    Var3   |   Var4   |   Var5   |   Desired Output   
     | -------- | -------- | --------- | -------- | -------- | ------------------ 
  1  |   10     |     25   |     3     |   100    |    10%   |        df1         
  2  |    3     |     32   |     1     |   350    |     5%   |        df2         
  3  |    0     |     45   |     3     |   120    |     2%   |        df3                 
   .
   .
   .
1000 |    5     |     38   |    18     |   430    |     2%   |      df1000        

Desired Output:

sum all cells of df1, df2, df3 until df1000

where

df1:

    |    Output1   | Output2 = C2  |  ...  |          Output10                 
    | ------------ | ------------- |       | --------------------------------    
 1  |  (10 * 3)+1  |    10% ^ 1    |       | prod(C2r1:C2r1)= .10            
 2  |  (10 * 3)+2  |    10% ^ 2    |       | prod(C2r1:C2r2)= .11         
 3  |  (10 * 3)+3  |    10% ^ 3    |       | prod(C2r1:C2r3)= .111              
    ...
100 | (10 * 3)+100 |   10% ^ 100   |       | prod(C2r1:C2r100)=0.111111111  

df2:

    |    Output1   | Output2 = C2  |  ...  |          Output10                    
    | ------------ | ------------- |       | --------------------------------  
 1  |  (3 * 1) +1  |     5% ^ 1    |       | prod(C2r1:C2r1)= .05            
 2  |  (3 * 1) +2  |     5% ^ 2    |       | prod(C2r1:C2r2)= .0525       
 3  |  (3 * 1) +3  |     5% ^ 3    |       | prod(C2r1:C2r3)= 0.052625          
   ...
100 |  (3 * 1)+100 |    5% ^ 100   |       | prod(C2r1:C2r100)= 0.0526316   

. . .

df1000:

    |    Output1   | Output2 = C2  |  ...  |          Output10                    
    | ------------ | ------------- |       | --------------------------------   
 1  | (5 * 18) +1  |     2% ^ 1    |       | prod(C2r1:C2r1)= 0.02           
 2  | (5 * 18) +2  |     2% ^ 2    |       | prod(C2r1:C2r2)= 0.0204      
 3  | (5 * 18) +3  |     2% ^ 3    |       | prod(C2r1:C2r3)= 0.020408         
  ...
100 | (5 * 18)+100 |    2% ^ 100   |       | prod(C2r1:C2r100)= 0.02040816

Final Results should be:

final_df

    |              Output1 = C1           |...|          Output10 = C10                    
    |-------------------------------------|---|---------------------------------   
 1  | df1C1r1 + df2C1r1 +...+ df1000C1r1  |   |df1C10r1 +df2C10r1+...+df1000C10r1
 2  | df1C1r2 + df2C1r2 +...+ df1000C1r2  |   |df1C10r2 +df2C10r2+...+df1000C10r2
 3  | df1C1r3 + df2C1r3 +...+ df1000C1r3  |   |df1C10r3 +df2C10r3+...+df1000C10r3
  ...
100 |df1C1r100+df2C1r100+...+ df1000C1r100|   |df1C10r100+df2C10r100+...+df1000C10r100

I tried the loop per row of the main data frame but I only get results from the last variables used.

Here's the sample codes:

main_df <- read_excel (main_df)

for ( i in 1 : nrow (main_df)) { 

   Var1  <-  as.numeric(main_df$Var1[i])
   Var2  <-  as.numeric(main_df$Var2[i])
   Var3  <-  as.numeric(main_df$Var3[i])
   Var4  <-  as.numeric(format(as.numeric(main_df$Var4[i]), nsmall=4))
   Var5  <-  as.numeric(main_df$Var5[i])


df <- tibble ( t        = 1 : 100,
               Output1  = ( Var1 * Var3 ) + t,
               Output2  = ( Var5 ) ^ t 
               Output10 = cumprod ( Output2 ))                  
 }


 final_df <-  # sum of all dfs 
 

A way other than looping will be highly appreciated.

Thank you very much!


Say these are the 5 rows to be computed as the main data frame:

screenshot of the first 5 rows sample

I should be getting these results for a period of 1 to 10 only.

generated results

Thank you very much!

Upvotes: 1

Views: 125

Answers (1)

Jon Spring
Jon Spring

Reputation: 66435

It's hard to answer a question that doesn't include sample data that we can run directly, but I think something like this could work.

There are a few parts of this I don't understand, like isn't Output10 the sum of those calculated ranges (e.g. 10% + 10%^2 = 0.11) and not their product (10% x 10%^2 = 0.0001)? Which do you want?

library(tidyverse)
main_df |>
  mutate(id = row_number()) |>
  complete(period = 1:100) |>
  mutate(Output1  = (Var1 * Var3) + period,
         Output2  = Var5 ^ period,
         Output10 = cumsum(Output2), .by = id) |>
  # at this point if you want separate df's, you could use
  # group_by(id) |> group_split()
  summarize(across(Output1:Output10, sum), .by = period)

Upvotes: 1

Related Questions