Reputation: 364
Given a character vector, I would like to loop through a function with a name assignment.
uprop
is a "data.frame" (1000 observations and 20 columns), as listed in output below:
> class(uprop)
[1] "data.frame"
And Department, Source, Target, and WeightCount are all column names in uprop
Let us say we need to simplify this repetitive task:
CAST_uprop_data <- subset(uprop, Department == "CAST", select = c(Source, Target, WeightCount))
CHEG_uprop_data <- subset(uprop, Department == "CHEG", select = c(Source, Target, WeightCount))
PHYS_uprop_data <- subset(uprop, Department == "PHYS", select = c(Source, Target, WeightCount))
Here CAST_uprop_data
is also a data.frame. (100 observations and 3 columns)
I can create a vector variable cust_dept_list
with the character names:
cust_dept_list <- c('CAST', 'CHEG', 'PHYS')
but, I can not figure out how to loop through the names and have it run and assign each one?
Here is my attempt:
for (i in c(cust_dept_list)){
print(paste0(i,"_uprop_data")) <- subset(uprop, Department == i, select = c(Source, Target, WeightCount)), i
}
Thanks in advance for helping a novice.
Upvotes: 0
Views: 3968
Reputation: 403
There are only rare cases in which you should be assigning global variables by looping through subsets. I would recommend learning the tidyverse.
If you don't understand anything below, please look it up because the %>% operator will save you a lot of time and effort (along with making code readable for others).
You will use a "tibble" which is very similar to a dataframe. Within this, you will simply group by the department and create an individual row with all of the data within it!
library(tidyverse)
unprop_data = data.frame(Department = c(rep("CAST",1000),rep("CHEG",1000),rep("PHYS",1000)),
Source = rnorm(3000),
Target = rnorm(3000),
WeightCount = rnorm(3000))
grouped_data = unprop_data %>%
group_by(Department) %>%
select(Source, Target, WeightCount) %>%
nest()
The result follows:
> grouped_data
# A tibble: 3 x 2
Department data
<fctr> <list>
1 CAST <tibble [1,000 x 3]>
2 CHEG <tibble [1,000 x 3]>
3 PHYS <tibble [1,000 x 3]>
If you needed to print all of these for some reason within a for loop (seems rough for 1000 lines per department) it would be as follows:
for(dept in unique(grouped_data$Department)){
print(dept)
print("###########################")
print(
grouped_data %>%
filter(Department == dept) %>%
unnest()
)
}
Which Returns:
[1] "CAST"
[1] "###########################"
# A tibble: 1,000 x 4
Department Source Target WeightCount
<fctr> <dbl> <dbl> <dbl>
1 CAST -0.3781853 -0.59457662 0.2796963
2 CAST 0.7261541 -1.06344758 1.1874874
3 CAST -0.1207312 0.56961950 0.2082236
4 CAST -1.5467661 1.23693964 -0.9732976
5 CAST -1.6626831 0.09252543 -0.3003913
6 CAST -0.2783635 -0.84363946 2.0588511
7 CAST 1.6981061 0.13755764 -0.3935691
8 CAST 0.4900337 -0.73662209 0.8861508
9 CAST 0.3971949 -0.23047428 1.6226582
10 CAST 0.7721574 -0.69117961 -0.4547899
# ... with 990 more rows
[1] "CHEG"
[1] "###########################"
# A tibble: 1,000 x 4
Department Source Target WeightCount
<fctr> <dbl> <dbl> <dbl>
1 CHEG -0.7843984 -0.8788216 0.60030359
2 CHEG -0.5636669 -2.2283878 -0.16178492
3 CHEG 0.9024084 -1.5052453 -1.58803972
4 CHEG 1.7662237 1.2125255 -0.91229428
5 CHEG 0.3950654 -0.8283651 0.07402481
6 CHEG 0.3928973 -1.3650744 -0.75262682
7 CHEG 1.1298127 1.4765888 -0.76059162
8 CHEG 0.4787867 0.6041770 -1.23313321
9 CHEG -1.4474401 -0.6747809 0.78431441
10 CHEG 0.6463868 0.2558378 -1.34131546
# ... with 990 more rows
[1] "PHYS"
[1] "###########################"
# A tibble: 1,000 x 4
Department Source Target WeightCount
<fctr> <dbl> <dbl> <dbl>
1 PHYS 0.1425978 -1.01397581 -0.16573546
2 PHYS -1.2572684 -1.13069956 -0.61870063
3 PHYS 1.2089882 1.51020970 -1.43474343
4 PHYS -0.6357010 -0.07362852 0.06683348
5 PHYS -1.6402587 -1.35273300 0.14436313
6 PHYS -0.9408105 -1.52515527 -0.06860152
7 PHYS 0.3143868 0.11814597 -0.37823801
8 PHYS -0.3232879 0.15408677 -0.62820531
9 PHYS 0.3152122 -0.72634466 -1.71955337
10 PHYS 0.7268282 -0.20872075 0.30780981
# ... with 990 more rows
Upvotes: 1
Reputation: 206606
Don't create a bunch of different variables; create a list of values instead with
cust_dept_list <- c('CAST', 'CHEG', 'PHYS')
uprop_data <- lapply(cust_dept_list, function(x)
subset(uprop, Department == x, select = c(Source, Target, WeightCount))
)
and then you can access the data.frames with
uprop_data[["CAST"]]
uprop_data[["CHEG"]]
...
and it will be easier to loop functions over these data sets in a list for future analyses. See related responses at how do I make a list of data.frames
Upvotes: 3