Reputation: 13
I have a main data frame with 5 columns (variables) and 1000 rows.
The desired output for each row entry is another data frame (with 10 columns and 100 rows each) generated/computed based on the variables from each row of the main data frame.
Also, the final result should be the sum of all resulting data frames.
I'm requesting for R codes.
Example:
Main data frame:
| Var 1 | Var2 | Var3 | Var4 | Var5 | Desired Output
| -------- | -------- | --------- | -------- | -------- | ------------------
1 | 10 | 25 | 3 | 100 | 10% | df1
2 | 3 | 32 | 1 | 350 | 5% | df2
3 | 0 | 45 | 3 | 120 | 2% | df3
.
.
.
1000 | 5 | 38 | 18 | 430 | 2% | df1000
Desired Output:
sum all cells of df1, df2, df3 until df1000
where
df1:
| Output1 | Output2 = C2 | ... | Output10
| ------------ | ------------- | | --------------------------------
1 | (10 * 3)+1 | 10% ^ 1 | | prod(C2r1:C2r1)= .10
2 | (10 * 3)+2 | 10% ^ 2 | | prod(C2r1:C2r2)= .11
3 | (10 * 3)+3 | 10% ^ 3 | | prod(C2r1:C2r3)= .111
...
100 | (10 * 3)+100 | 10% ^ 100 | | prod(C2r1:C2r100)=0.111111111
df2:
| Output1 | Output2 = C2 | ... | Output10
| ------------ | ------------- | | --------------------------------
1 | (3 * 1) +1 | 5% ^ 1 | | prod(C2r1:C2r1)= .05
2 | (3 * 1) +2 | 5% ^ 2 | | prod(C2r1:C2r2)= .0525
3 | (3 * 1) +3 | 5% ^ 3 | | prod(C2r1:C2r3)= 0.052625
...
100 | (3 * 1)+100 | 5% ^ 100 | | prod(C2r1:C2r100)= 0.0526316
. . .
df1000:
| Output1 | Output2 = C2 | ... | Output10
| ------------ | ------------- | | --------------------------------
1 | (5 * 18) +1 | 2% ^ 1 | | prod(C2r1:C2r1)= 0.02
2 | (5 * 18) +2 | 2% ^ 2 | | prod(C2r1:C2r2)= 0.0204
3 | (5 * 18) +3 | 2% ^ 3 | | prod(C2r1:C2r3)= 0.020408
...
100 | (5 * 18)+100 | 2% ^ 100 | | prod(C2r1:C2r100)= 0.02040816
Final Results should be:
final_df
| Output1 = C1 |...| Output10 = C10
|-------------------------------------|---|---------------------------------
1 | df1C1r1 + df2C1r1 +...+ df1000C1r1 | |df1C10r1 +df2C10r1+...+df1000C10r1
2 | df1C1r2 + df2C1r2 +...+ df1000C1r2 | |df1C10r2 +df2C10r2+...+df1000C10r2
3 | df1C1r3 + df2C1r3 +...+ df1000C1r3 | |df1C10r3 +df2C10r3+...+df1000C10r3
...
100 |df1C1r100+df2C1r100+...+ df1000C1r100| |df1C10r100+df2C10r100+...+df1000C10r100
I tried the loop per row of the main data frame but I only get results from the last variables used.
Here's the sample codes:
main_df <- read_excel (main_df)
for ( i in 1 : nrow (main_df)) {
Var1 <- as.numeric(main_df$Var1[i])
Var2 <- as.numeric(main_df$Var2[i])
Var3 <- as.numeric(main_df$Var3[i])
Var4 <- as.numeric(format(as.numeric(main_df$Var4[i]), nsmall=4))
Var5 <- as.numeric(main_df$Var5[i])
df <- tibble ( t = 1 : 100,
Output1 = ( Var1 * Var3 ) + t,
Output2 = ( Var5 ) ^ t
Output10 = cumprod ( Output2 ))
}
final_df <- # sum of all dfs
A way other than looping will be highly appreciated.
Thank you very much!
Say these are the 5 rows to be computed as the main data frame:
screenshot of the first 5 rows sample
I should be getting these results for a period of 1 to 10 only.
Thank you very much!
Upvotes: 1
Views: 125
Reputation: 66435
It's hard to answer a question that doesn't include sample data that we can run directly, but I think something like this could work.
There are a few parts of this I don't understand, like isn't Output10 the sum of those calculated ranges (e.g. 10% + 10%^2 = 0.11) and not their product (10% x 10%^2 = 0.0001)? Which do you want?
library(tidyverse)
main_df |>
mutate(id = row_number()) |>
complete(period = 1:100) |>
mutate(Output1 = (Var1 * Var3) + period,
Output2 = Var5 ^ period,
Output10 = cumsum(Output2), .by = id) |>
# at this point if you want separate df's, you could use
# group_by(id) |> group_split()
summarize(across(Output1:Output10, sum), .by = period)
Upvotes: 1