lamushidi
lamushidi

Reputation: 303

R, Filtering (subsetting) data with characters and assign names accordingly?

I have a data set raw.data.2010 that needs several steps of subsetting with different animal species. I also need to name them accordingly after every filtering process. I wrote a simple code as below:

#Creating reproducible data######
site=rep(list("Q", "R", "S", "T"), each=500)
grid=sample(1:2, size=2000, replace=TRUE)
spp=rep(list("A", "B", "C", "D", "E"), each=400)
fate=sample(1:5, size=20000, replace=TRUE)
sex=rep(list("M","F"), each=2000)
weight=sample(85:140, size=2000, replace=TRUE)

raw.data=as.data.frame(cbind(site, grid, spp, fate, sex, weight))

### main codes#####
spp=c("A", "B", "C", "D", "E")
    for (i in spp){
        name=paste(i, "raw", sep=".", collapse="")
        filter=paste("select",i, sep="", collapse="")
        assign(filter, raw.data$spp==i)
        assign(name, raw.data[get(filter),])
    }

I checked the filters and they worked without problem. But the last line didn't work so all the subsetted data I called returned NA. What was wrong? Thank you.

EDIT: Hi, thank you all for your advice. I edited my codes so it's reproducible. Basically I would like to first filter my raw.data with spp. Then I can keep adding more filters to group them according to site, grid, fate...etc. I need to be able to access the filtered data individually so I can manipulate them for later use, ex. calculate weight and other measurements for different sex or age group. I want to be able to call A.raw, A.Q.data later.

Since I would like to analyze my data at different levels (e.g. population level, individual level, site/grid level), and be able to pool/split them according to my needs. That's the purpose of this code. Hope my explanation doesn't confuse you.

Upvotes: 1

Views: 1542

Answers (3)

IRTFM
IRTFM

Reputation: 263362

Your example is all mucked up. Here's a proper example and never ever use as.data.frame(cbind(...))

 site=rep(c("Q", "R", "S", "T"), each=500)
 grid=sample(1:2, size=2000, replace=TRUE)
 spp=rep(c("A", "B", "C", "D", "E"), each=400)
 fate=sample(1:5, size=20000, replace=TRUE)
 sex=rep(c("M","F"), each=2000)
 weight=sample(85:140, size=2000, replace=TRUE)

 raw.data=data.frame(site=site, grid=grid, spp=spp, fate=fate, sex=sex, weight=weight)
 names(group.spp) <- paste(levels(raw.data$spp), "raw", sep=".")

#------------------------
 str(group.spp)
List of 5
 $ A.raw:'data.frame':  4000 obs. of  6 variables:
  ..$ site  : Factor w/ 4 levels "Q","R","S","T": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ grid  : int [1:4000] 2 1 2 1 2 1 1 1 1 2 ...
  ..$ spp   : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..$ fate  : int [1:4000] 3 2 3 5 5 2 3 2 5 2 ...
  ..$ sex   : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
  ..$ weight: int [1:4000] 136 93 115 100 97 128 120 124 97 120 ...
 $ B.raw:'data.frame':  4000 obs. of  6 variables:
  ..$ site  : Factor w/ 4 levels "Q","R","S","T": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ grid  : int [1:4000] 2 2 1 2 2 2 1 2 2 2 ...
  ..$ spp   : Factor w/ 5 levels "A","B","C","D",..: 2 2 2 2 2 2 2 2 2 2 ...
  ..$ fate  : int [1:4000] 5 5 2 4 3 4 2 3 4 5 ...
  ..$ sex   : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
  ..$ weight: int [1:4000] 137 126 116 97 97 86 134 103 86 140 ...
 $ C.raw:'data.frame':  4000 obs. of  6 variables:
  ..$ site  : Factor w/ 4 levels "Q","R","S","T": 2 2 2 2 2 2 2 2 2 2 ...
  ..$ grid  : int [1:4000] 1 2 1 2 2 2 1 2 2 1 ...
  ..$ spp   : Factor w/ 5 levels "A","B","C","D",..: 3 3 3 3 3 3 3 3 3 3 ...
  ..$ fate  : int [1:4000] 2 4 4 2 5 1 2 1 2 5 ...
  ..$ sex   : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
  ..$ weight: int [1:4000] 132 85 96 87 91 94 94 122 116 87 ...
 $ D.raw:'data.frame':  4000 obs. of  6 variables:
  ..$ site  : Factor w/ 4 levels "Q","R","S","T": 3 3 3 3 3 3 3 3 3 3 ...
  ..$ grid  : int [1:4000] 2 2 2 1 1 2 2 1 1 2 ...
  ..$ spp   : Factor w/ 5 levels "A","B","C","D",..: 4 4 4 4 4 4 4 4 4 4 ...
  ..$ fate  : int [1:4000] 2 4 1 4 2 4 1 5 1 4 ...
  ..$ sex   : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
  ..$ weight: int [1:4000] 130 139 100 107 126 119 134 110 103 135 ...
 $ E.raw:'data.frame':  4000 obs. of  6 variables:
  ..$ site  : Factor w/ 4 levels "Q","R","S","T": 4 4 4 4 4 4 4 4 4 4 ...
  ..$ grid  : int [1:4000] 2 2 1 1 1 1 2 2 2 1 ...
  ..$ spp   : Factor w/ 5 levels "A","B","C","D",..: 5 5 5 5 5 5 5 5 5 5 ...
  ..$ fate  : int [1:4000] 5 5 4 5 5 3 1 4 4 3 ...
  ..$ sex   : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 2 2 2 2 ...
  ..$ weight: int [1:4000] 88 96 99 101 119 94 97 132 137 115 ...

Upvotes: 0

Greg Snow
Greg Snow

Reputation: 49640

You will probably save yourself a lot of work and grief in the long run if you move away from using global variables with assign and get and instead work with lists (and remember to subset using [[ instead of $).

Upvotes: 8

mkayala
mkayala

Reputation: 321

The issue seems to be that you need to "get" the variable with the name stored in filter, rather than use filter itself.

This should work:

spp=c("A", "B", "C", "D", "E")
for (i in spp){
    name=paste(i, "raw", sep=".", collapse="")
    filter=paste("select",i, sep="", collapse="")
    assign(filter, raw.data.2010$Spp==i)
    assign(name, raw.data.2010[get(filter),])
}

Upvotes: 3

Related Questions