Reputation: 241
I'm currently stuck with my data frame and i would like to know how to do "subsets of subsets of subsets" Here is a part of my data frame:
YEAR RN DATE NAME SITE LONG SP SUMNB NB100
1 2011 RNN027 15056 ESTAGNOL RNN027-Estagnol 02 310 Anthocharis cardamines (Linnaeus, 1758) 1 0.3225806
2 2011 RNN027 15075 ESTAGNOL RNN027-Estagnol 02 310 Anthocharis cardamines (Linnaeus, 1758) 1 0.3225806
3 2003 RNN027 12166 ESTAGNOL RNN027-Estagnol 03 330 Anthocharis cardamines (Linnaeus, 1758) 2 0.6060606
4 2006 RNN027 13252 ESTAGNOL RNN027-Estagnol 03 330 Anthocharis cardamines (Linnaeus, 1758) 2 0.6060606
5 2006 RNN027 13257 ESTAGNOL RNN027-Estagnol 03 330 Anthocharis cardamines (Linnaeus, 1758) 2 0.6060606
6 2005 RNN027 12895 ESTAGNOL RNN027-Estagnol 01 540 Anthocharis cardamines (Linnaeus, 1758) 2 0.3703704
My point is to compute a abundance factor for each species. To do that, i have to isolate every count date for every species, every year, and every site.
My first idea was to do multiple loops and subseting every step by the previous criteria:
DF --> Loop SITE ; subset of each SITE -->loop YEAR; subset of each YEAR -->loop SP; subset of each SPECIES--> dates of observations
The point of isolating these dates require further modifications (adding rows), but i need to be capable of rewriting the modified subsets afterwards and reconstruct a new dataframe.
I built my loops command:
LOOPSITE<-sort(unique(DF$SITE))
for(i in LOOPSITE){
print(i)
LOOPSITESUB<-subset(DF,grepl(i,SITE))
LOOPYEAR<-sort(unique(LOOPSITESUB$YEAR))
print(LOOPYEAR)
for(j in LOOPYEAR){
print(j)
LOOPYEARSUB<-subset(LOOPSITESUB,grepl(j,YEAR))
LOOPSP<-sort(unique(LOOPYEARSUB$SP))
print(length(LOOPSP))
for(k in LOOPSP){
print(k)
LOOPSPSUB<-subset(LOOPYEARSUB,grepl(k,SP))
print(sum(LOOPYEARSUB$SUMNB))
print(head(LOOPSPSUB))
}
}
}
I am able to follow that my script is working with all these "print" commands, and it is working until i reach the species subseting. For an unknown reason, the last subsetting dont concern each species, but only some of them. Here is a part of the output for the last SITE and the last YEAR:
"RNN027-Estagnol 01"
...(I skipped all the sites)
"RNN027-Estagnol 06"
"2003"
...(I skipped all the years)
"2011"
[1] 22
[1] "Aricia agestis D., 1775"
[1] 107
YEAR RN DATE NOM SITE LONG SP SUMNB NB100
66 2011 RNN027 2011-04-21 ESTAGNOL RNN027-Estagnol 06 260 Aricia agestis D., 1775 1 0.3846154
67 2011 RNN027 2011-05-22 ESTAGNOL RNN027-Estagnol 06 260 Aricia agestis D., 1775 1 0.3846154
68 2011 RNN027 2011-08-05 ESTAGNOL RNN027-Estagnol 06 260 Aricia agestis D., 1775 2 0.7692308
[1] "Brintesia circe (Fabricius, 1775)"
[1] 107
[1] YEAR RN DATE NOM SITE LONG SP SUMNB NB100
<0 rows> (or 0-length row.names)
[1] "Carcharodus alceae (Esper, 1780)"
[1] 107
[1] YEAR RN DATE NOM SITE LONG SP SUMNB NB100
<0 rows> (or 0-length row.names)
It is working for "Aricia agestis D., 1775" but not for "Brintesia circe (Fabricius, 1775)". I verified on my dataframe, that second species have been observed at this time and place,and have the same format than the previous one...it should be working.
How many loops can i stack like this ? Is there another way to do that? (it would be convenient and faster). I'm aware of the "split" function, who basically dismont every group, but as i cant exploit every"chunk", it dont fit to my task. I am maybe wrong.
At the last step (after modifing all the subsets), i should be able to write each subset in a new dataframe to reconstruct a modified version of my input.
I'm am maybe on the wrongest way i possibly can go! I can provide further explanations if needed!
Thanks for your help!
EDIT:
I'll try to explain what i want to do. In order to calculate my abundance index, i need to add "blank" rows before and after each temporal "session" of observation. Basically, i try to obtain a subset for every combination of 3 differents factors (SITE, YEAR and SP).
Here is an example of the type of output i would like to obtain. For every SITE X/YEAR Y/SP Z possible combination:
YEAR RN DATE NAME SITE LONG SP SUMNB NB100
----ADD A NEW ROW----DATE MINUS 7 DAYS-----------------------------------------------------------------------------------
1 Y RNN027 15056 ESTAGNOL RNN027-Estagnol X 310 SP Z 1 0.3225806
2 Y RNN027 15075 ESTAGNOL RNN027-Estagnol X 310 SP Z 1 0.3225806
3 Y RNN027 12166 ESTAGNOL RNN027-Estagnol X 330 SP Z 2 0.6060606
4 Y RNN027 13252 ESTAGNOL RNN027-Estagnol X 330 SP Z 2 0.6060606
5 Y RNN027 13257 ESTAGNOL RNN027-Estagnol X 330 SP Z 2 0.6060606
6 Y RNN027 12895 ESTAGNOL RNN027-Estagnol X 540 SP Z 2 0.3703704
----ADD A NEW ROW----DATE PLUS 7 DAYS-----------------------------------------------------------------------------------
Then i rewrite and compile every modified subset in a new DF.
EDIT 2: The use of "split(DF, list(DF$SITE, DF$YEAR, DF$SP))" crashed my computer, unless I dropped the unused values. I got exactly what I want, but how can I access and modify every subset ?
Upvotes: 1
Views: 152
Reputation: 81733
I suppose you are looking for aggregate
.
aggregate(SUMNB ~ SITE + YEAR + SP, DF, sum)
# SITE YEAR SP SUMNB
# 1 RNN027-Estagnol 03 2003 Anthocharis cardamines (Linnaeus, 1758) 2
# 2 RNN027-Estagnol 01 2005 Anthocharis cardamines (Linnaeus, 1758) 2
# 3 RNN027-Estagnol 03 2006 Anthocharis cardamines (Linnaeus, 1758) 4
# 4 RNN027-Estagnol 02 2011 Anthocharis cardamines (Linnaeus, 1758) 2
The command calculates the sum of all values in SUMNB
for each combination of SITE
, YEAR
and SP
.
Edit
Does the following code produce what your're looking for?
do.call(rbind, by(DF, DF[c("SITE", "YEAR", "SP")], FUN = function(x) {
tmp <- x[c(1, seq(nrow(x)), nrow(x)), ]
tmp$DATE[1] < tmp$DATE[1] - 7
tmp$DATE[nrow(tmp)] <- tmp$DATE[nrow(tmp)] + 7
return(tmp)
}))
Upvotes: 3
Reputation: 13122
Based on your edits, I believe this could be useful:
set.seed(11)
DF <- data.frame(YEAR = sample(c(2001, 2003), 5, T), #random data
SITE = sample(c("a", "b"), 5, T),
SP = sample(c("sp1", "sp2"), 5, T),
DATE = sample(12345:15678, 5))
res <- lapply(split(DF, list(DF$SITE, DF$YEAR, DF$SP)),
function(x)
{
if(nrow(x) > 0)
{
row1 <- x[1,]
names(row1) <- colnames(x)
row1["DATE"] <- x$DATE[1] - 7
rown <- x[nrow(x),]
names(rown) <- colnames(x)
rown["DATE"] <- x$DATE[nrow(x)] + 7
rbind(row1, x, rown)
}
})
DF2 <- do.call(rbind, res)
rownames(DF2) = seq_len(nrow(DF2))
DF
# YEAR SITE SP DATE
#1 2001 b sp1 14257
#2 2001 a sp1 13950
#3 2003 a sp2 13446
#4 2001 b sp2 12870
#5 2001 a sp2 13943
DF2
# YEAR SITE SP DATE
#1 2001 a sp1 13943
#2 2001 a sp1 13950
#3 2001 a sp1 13957
#4 2001 b sp1 14250
#5 2001 b sp1 14257
#6 2001 b sp1 14264
#7 2001 a sp2 13936
#8 2001 a sp2 13943
#9 2001 a sp2 13950
#10 2001 b sp2 12863
#11 2001 b sp2 12870
#12 2001 b sp2 12877
#13 2003 a sp2 13439
#14 2003 a sp2 13446
#15 2003 a sp2 13453
Upvotes: 1