Reputation: 1
I have trouble looping dataframes by their name. And have no idea how to fix this. I am using census data from multiple years and have to apply the same operations over multiple datasets.
Here is a simplified example of that I want to do. I create a dataset called df1. And make two copies of it called df2 and df3. Let’s say that for each data frame I want to make a variable 3 (v3) which is v3=v1+v2.
The loop that I made won’t work. And I don’t know how to loop date frames correctly by their name.
v1<-c(1:10)
v2<-c(1:10)
df1<-data.frame(v1,v2)
df2<-df1
df3<-df1
x<-c("df1","df2","df3")
for (i in x) {v3<-v1+v2}
Upvotes: 0
Views: 62
Reputation: 10232
What you are looking for is the get
and assign
-functions.
For example you can use it like this:
df1 <- data.frame(v1 = 1:10, v2 = 1:10)
df2 <- df1
df3 <- df1
x <- c("df1","df2","df3")
for (i in x) {
# load the dataset "i" to the tmp-variable
tmp <- get(i)
# do something
tmp$v3 <- tmp$v1 + tmp$v2
# assign the tmp variable to the value of "i" again
assign(i, tmp)
}
# lets have a check
df1
#> v1 v2 v3
#> 1 1 1 2
#> 2 2 2 4
#> 3 3 3 6
#> 4 4 4 8
#> 5 5 5 10
#> 6 6 6 12
#> 7 7 7 14
#> 8 8 8 16
#> 9 9 9 18
#> 10 10 10 20
df2
#> v1 v2 v3
#> 1 1 1 2
#> 2 2 2 4
#> 3 3 3 6
#> 4 4 4 8
#> 5 5 5 10
#> 6 6 6 12
#> 7 7 7 14
#> 8 8 8 16
#> 9 9 9 18
#> 10 10 10 20
df3
#> v1 v2 v3
#> 1 1 1 2
#> 2 2 2 4
#> 3 3 3 6
#> 4 4 4 8
#> 5 5 5 10
#> 6 6 6 12
#> 7 7 7 14
#> 8 8 8 16
#> 9 9 9 18
#> 10 10 10 20
Having said that, you probably dont want to do it this way, instead try to use the apply-family of commands.
I usually tend to use a lot the lapply
-function. In your case it would look like this:
# create some data again
df <- data.frame(v1 = 1:10, v2 = 1:10)
# create three data-frames in a list
# here you would for example, load the dataframes from your source into the list
df_list <- lapply(1:3, function(x) df)
str(df_list)
#> List of 3
#> $ :'data.frame': 10 obs. of 2 variables:
#> ..$ v1: int [1:10] 1 2 3 4 5 6 7 8 9 10
#> ..$ v2: int [1:10] 1 2 3 4 5 6 7 8 9 10
#> $ :'data.frame': 10 obs. of 2 variables:
#> ..$ v1: int [1:10] 1 2 3 4 5 6 7 8 9 10
#> ..$ v2: int [1:10] 1 2 3 4 5 6 7 8 9 10
#> $ :'data.frame': 10 obs. of 2 variables:
#> ..$ v1: int [1:10] 1 2 3 4 5 6 7 8 9 10
#> ..$ v2: int [1:10] 1 2 3 4 5 6 7 8 9 10
# do some operations:
df_list2 <- lapply(df_list, function(d) {
# do something
d$v3 <- d$v1 + 100 * d$v2
return(d)
})
df_list2
#> [[1]]
#> v1 v2 v3
#> 1 1 1 101
#> 2 2 2 202
#> 3 3 3 303
#> 4 4 4 404
#> 5 5 5 505
#> 6 6 6 606
#> 7 7 7 707
#> 8 8 8 808
#> 9 9 9 909
#> 10 10 10 1010
#>
#> [[2]]
#> v1 v2 v3
#> 1 1 1 101
#> 2 2 2 202
#> 3 3 3 303
#> 4 4 4 404
#> 5 5 5 505
#> 6 6 6 606
#> 7 7 7 707
#> 8 8 8 808
#> 9 9 9 909
#> 10 10 10 1010
#>
#> [[3]]
#> v1 v2 v3
#> 1 1 1 101
#> 2 2 2 202
#> 3 3 3 303
#> 4 4 4 404
#> 5 5 5 505
#> 6 6 6 606
#> 7 7 7 707
#> 8 8 8 808
#> 9 9 9 909
#> 10 10 10 1010
Upvotes: 2