Reputation: 305
Suppose the following data frame (in reality my data frame has thousands of rows):
year<-c(2010,2010,2010,2011,2011,2011,2012,2012,2013,2013)
a1<-rnorm(10)
a2<-rnorm(10)
b1<-rnorm(10)
b2<-rnorm(10)
c1<-rnorm(10)
c2<-rnorm(10)
I used the following code to create a list consisting of multiple data frames, which splits the original data frame into subsets by year.
#split datasets into years
df.list<-split(df, df$year)
#Name of datasets df plus year
dfnames <- str_c("df", names(df.list))
names(df.list)<-dfnames
I want to apply the following loop to all data frames of the list:
#df_target is a new data frame that stores the results and j is the indicator for it:
df_target <- NULL
j <- 1
for(i in seq(2, 7, 2)) {
df_target[[j]] <- (df[i]*df[i+1])/(sum(df[i+1]))
j <- j+1
}
}
The code works fine for one data frame, however, I want to split the data frame into multiple data frames grouped by year and then loop through the columns.
Thus, I use the following function to apply the loop mentioned above to all data frames from the list:
df_target <- NULL
j <- 1
fnc <- function(x){
for(i in seq(2, 7, 2)) {
df_target[[j]] <- (x[i]*x[i+1])/(sum(x[i+1]))
j <- j+1
}
}
sapply(df.list, fnc)
With this code, I don't get any error messages, however both data frames from the list are NULL. What exactly am I doing wrong?
df_target should be a data frame containing columns a_new= (a1a2)/sum(a2), b_new= (b1b2)/sum(b2) and c_new= (c1*c2)/sum(c2) but for each year separately.
Upvotes: 0
Views: 1430
Reputation: 9923
Here is a tidyverse
solution. Try running this bit by bit so you can see what it does.
First it adds the rowid as a column to make sure unique rows can be identified later. Then it reshapes the data using pivot_longer
to put the data into long format, and then pivot_wider
to partially reverse this.
Then the data are grouped and the calculation run. This is running a loop internally.
library(tidyverse)
set.seed(123)
tibble(
year = c(2010, 2010, 2010, 2011, 2011, 2011, 2012, 2012, 2013, 2013),
a1 = rnorm(10),
a2 = rnorm(10),
b1 = rnorm(10),
b2 = rnorm(10),
c1 = rnorm(10),
c2 = rnorm(10)
) %>%
rowid_to_column() %>%
pivot_longer(cols = -c(year, rowid), names_to = c("nameA", "name12"), names_pattern = "(\\w)(\\d)" ) %>%
pivot_wider(names_from = name12, values_from = value) %>%
group_by(nameA) %>%
mutate(j = `1` * `2` / (sum(`2`)))
#> # A tibble: 30 x 6
#> # Groups: nameA [3]
#> rowid year nameA `1` `2` j
#> <int> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 2010 a -0.560 1.22 -0.329
#> 2 1 2010 b -1.07 0.426 -0.141
#> 3 1 2010 c -0.695 0.253 -0.0794
#> 4 2 2010 a -0.230 0.360 -0.0397
#> 5 2 2010 b -0.218 -0.295 0.0200
#> 6 2 2010 c -0.208 -0.0285 0.00268
#> 7 3 2010 a 1.56 0.401 0.299
#> 8 3 2010 b -1.03 0.895 -0.285
#> 9 3 2010 c -1.27 -0.0429 0.0245
#> 10 4 2011 a 0.0705 0.111 0.00374
#> # … with 20 more rows
Created on 2020-10-26 by the reprex package (v0.3.0)
Upvotes: 1
Reputation: 4456
You need to define j
and df_target
inside the function, and set what should it return (as it is now, it makes the calculation of df_target
, but doesn't return's it):
fnc <- function(x){
df_target <- NULL
j <- 1
for(i in seq(2, 7, 2)) {
df_target[[j]] <- (x[i]*x[i+1])/(sum(x[i+1]))
j <- j+1
}
return(df_target)
}
But keep in mind that this will output a matrix of lists, as for each element of df.list
that sapply
will select, you'll be creating a 3 element list of df_target
, so the output will look like this in the console:
> sapply(df.list, fnc)
df2010 df2011 df2012 df2013
[1,] List,1 List,1 List,1 List,1
[2,] List,1 List,1 List,1 List,1
[3,] List,1 List,1 List,1 List,1
But will be this:
To get a cleaner output, we can set df_target
to create a data frame with the values from each year:
fnc <- function(x){
df_target <- as.data.frame(matrix(nrow=nrow(x), ncol=3))
for(i in seq(2, 7, 2)) {
df_target[,i/2] <- (x[i]*x[i+1])/(sum(x[i+1]))
}
return(df_target)}
This returns a df per year, but if we use sapply
we'll get a similar output of matrix of lists, so its better to define the function to already loop trough every year:
fnc <- function(y){
df_target.list <- list()
k=1
for(j in y){
df_target <- as.data.frame(matrix(nrow=nrow(j), ncol=3))
for(i in seq(2, 7, 2)) {
df_target[,i/2] <- (j[i]*j[i+1])/(sum(j[i+1]))
}
df_target.list[[names(y)[k]]] = df_target
k=k+1
}
return(df_target.list)}
Output:
> fnc(df.list)
$df2010
V1 V2 V3
1 -0.10971160 0.01688244 -0.16339367
2 0.05440564 0.57554210 -0.06803244
3 0.03185178 0.90598561 -0.68692401
$df2011
V1 V2 V3
1 -0.43090055 0.007152131 0.3930606
2 0.15050644 0.329092942 -0.1367295
3 0.07336839 -0.423631930 -0.1504056
$df2012
V1 V2 V3
1 0.5540294 0.4561862 0.09169914
2 0.1153931 -1.1311450 0.81853691
$df2013
V1 V2 V3
1 0.4322934 0.5286973 0.2136495
2 -0.2412705 0.1316942 0.1455196
Upvotes: 1