Reputation: 91
I have data that looks like this:
time <- c(1:20)
temp <- c(2,3,4,5,6,2,3,4,5,6,2,3,4,5,6,2,3,4,5,6)
data <- data.frame(time,temp)
this is a very basic representation of my data. If you plot this, you can see easily that there are 4 up-sloping groups of data. I want to split the original data frame in to these 4 "subsets" so that I can run calculations on them, like "mean", "max", "min" and "std". I'd like to use the split() but it will only split based on a factor level. I'd like to be able to feed split
a conditional statement, such as split if: diff(data$temp) > -2
.
My problem is actually much more complex than this, but is there a function like split
that will allow me to create new data frames based on a conditional statement? as apposed to splitting based on factor levels.
Thanks all!
Upvotes: 5
Views: 3188
Reputation: 3210
If your data isn't so behaved, you can use cut()
to create the categorical variable. The only 'problem' is that it's 100% manual.
time <- c(1:200)
temp <- (time %% 51) * (-1)^(time %/% 51) + rnorm(200)
data <- data.frame(time,temp)
layout(matrix(c(1, 1, 2, 2, 3, 4, 5 ,6), nrow=2))
plot(data, main='All data')
time2 <- cut(time, c(0, 50, 101, 152, 200))
plot(data, col=time2, main='All data, by time2')
data2 <- split(data, time2)
for (i in 1:4) {
plot(data2[[i]], main=names(data2)[i])
}
EDIT:
Now a 100% automatic process:
time <- c(1:200)
temp <- (time %% 51) * (-1)^(time %/% 51) + rnorm(200)
data <- data.frame(time,temp)
layout(matrix(c(1, 1, 2, 2, 3, 4, 5 ,6), nrow=2))
plot(data, main='All data')
tol <- 10 # Here you set the minimum value to consider as a structural break
time2 <- cut(time, c(0, which(abs(diff(data$temp)) >= tol), max(time)), rigth=FALSE)
plot(data, col=time2, main='All data, by time2')
data2 <- split(data, time2)
for (i in 1:4) {
plot(data2[[i]], main=names(data2)[i])
}
Upvotes: 0
Reputation: 13363
The trick is to convert your conditional statement into something that can be construed as a factor. In this particular example:
tmp <- c(1,diff(data[[2]]))
# [1] 1 1 1 1 1 -4 1 1 1 1 -4 1 1 1 1 -4 1 1 1 1
tmp2 <- tmp < 0
# [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
# [13] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
tmp3 <- cumsum(tmp2)
# [1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
split(data, tmp3)
# $`0`
# time temp
# 1 1 2
# 2 2 3
# 3 3 4
# 4 4 5
# 5 5 6
#
# $`1`
# time temp
# 6 6 2
# 7 7 3
# 8 8 4
# 9 9 5
# 10 10 6
#
# $`2`
# time temp
# 11 11 2
# 12 12 3
# 13 13 4
# 14 14 5
# 15 15 6
#
# $`3`
# time temp
# 16 16 2
# 17 17 3
# 18 18 4
# 19 19 5
# 20 20 6
Upvotes: 4