Reputation: 25
I wrote the following code to extract multiple datasets out of one large dataset based on the column Time
.
for(i in 1:nrow(position)) {
assign(paste("position.",i,sep=""), subset(dataset, Time >= position[i,1] & Time <= position[i,2])
)
}
(position is a list which contains the starttime[,1]
and stoptime[,2]
)
The outputs are subsets of my original dataset and looke like:
position.1
position.2
position.3
....
Is there a possibility to add an extra column to each of the new datasets (position.1
, position.2
, ...) Which defines them by a number?
eg: position.1
has an extra column with value 1, position.2
has an extra column with value 2, and so on.
I need those numbers to identify the datasets (position.1
, position.2
, ...) after I rbind
them in a last step to on dataset again.
Upvotes: 1
Views: 280
Reputation: 1021
In addition to Thomas's recommendation to avoid side effects, you might want to take advantage of existing packages that detect overlaps. The IRanges package in Bioconductor can detect overlaps between one set of ranges (position
) and another set of ranges or positions (dataset$Time
). This gets you the matches between the time points and the ranges:
r <- IRanges(position[[1L]], position[[2L]])
hits <- findOverlaps(dataset$Time, r)
Now, you want to extract a subset of the dataset that overlaps each range in position
. We can group the query (Time
) indices by the subject (position
) indices and extract a list from the dataset
using that grouping:
dataset <- DataFrame(dataset)
l <- extractList(dataset, split(queryHits(hits), subjectHits(hits)))
To get the final answer, we need to combine the list elements row-wise, while adding a column that denotes their group membership:
ans <- stack(l)
Upvotes: 1
Reputation: 44565
Since you don't provide example data, this is untested, but should work for you:
dflist <-
lapply(1:nrow(position), function(x) {
within(dataset[dataset$Time >= position[x,1] & dataset$Time <= position[x,2],], val = x)
}
do.call(rbind, dflist)
Basically, you never want to take the strategy you propose of assigning multiple numbered objects to the global environment. It is much easier to store all of the subsets in a list and then bind them back together using do.call(rbind, dflist)
. This is more efficiently, produces less clutter in your workspace, and is a more "functional" style of programming.
Upvotes: 1