Kaye11
Kaye11

Reputation: 359

subsetting in a specific sequence of time points in R, can I use seq?

I have a data frame that looks like this:

structure(list(A = c(70, 70, 70, 70, 70, 70), T = c(0.1, 0.2, 
0.3, 0.4, 0.5, 0.6), X = c(434.01, 434.01, 434.75, 434.75, 434.75, 
434.01), Y = c(454.92, 454.92, 454.92, 454.92, 454.18, 454.92
), V = c(0, 0, 21.128, 0, 14.94, 14.94), thetarad = c(0.151841552716899, 
0.151841552716899, 0.150990672182432, 0.150990672182432, 0.150177486839524, 
0.151841552716899), thetadeg = c(8.69988012340509, 8.69988012340509, 
8.6511282599214, 8.6511282599214, 8.6045361718215, 8.69988012340509
)), .Names = c("A", "T", "X", "Y", "V", "thetarad", "thetadeg"
), row.names = 1423:1428, class = "data.frame")

I want to subset specific time points in R with intervals of 30 sec. I can do this by manually subsetting each time point that I want:

a1=subset(binA, T==0.1)
a2=subset(binA, T==30)
a3=subset(binA, T==60)
a4=subset(binA, T==90)
a5=subset(binA, T==120)
a6=subset(binA, T==150)
a7=subset(binA, T==180)
a8=subset(binA, T==210)
a9=subset(binA, T==240)
a10=subset(binA, T==270)
a11=subset(binA, T==300)
a12=subset(binA, T==330)
a13=subset(binA, T==360)
a14=subset(binA, T==390)
a15=subset(binA, T==420)
a16=subset(binA, T==450)
a17=subset(binA, T==480)
a18=subset(binA, T==510)
a19=subset(binA, T==540)
a20=subset(binA, T==570)
a21=subset(binA, T==599.5)

I tried subsetting using sapplyand the seq function but got confusing results. I also want to count the unique A in each subset of data. I also know I can do this using the count function in plyrpackage.

a1=count(unique(subset(binA, T==0.1)))

but count will work with one data frame and not multiple ones (correct me if I am wrong). I also want to take the means of thetadeg for each subset (this should be easy for sapply in one data frame only). So I need help on how to write a function with specific seq points.

I know this problem is trivial but help would be appreciated.

Thanks

Upvotes: 1

Views: 1093

Answers (4)

IRTFM
IRTFM

Reputation: 263411

The function I think you want is split:

 subsetted.by.T <- split(dfrm, dfrm$T)
lapply(subsetted.by.T, nrow)

$`0.1`
[1] 1

$`0.2`
[1] 1

$`0.3`
[1] 1

$`0.4`
[1] 1

$`0.5`
[1] 1

$`0.6`
[1] 1

> subsetted.by.T[[1]]
      A   T      X      Y V  thetarad thetadeg
1423 70 0.1 434.01 454.92 0 0.1518416  8.69988

If you want to name these individual items, then the names<- function would be appropriate:

names(subsetted.by.T) <- paste0("a", seq(length(subsetted.by.T) ) )

If the "T" column were somewhat irregular in its values, then perhaps using cut to create categories at regular breaks would be useful for the purpose of splitting. The question might be clarified if "T" were actually a time value. At the moment it's a "numeric" value, but there are cut methods for datetime classes.

Upvotes: 0

Ananta
Ananta

Reputation: 3711

If purpose is just to get average, unique count etc, you don't need to subset.and one more thing, id T factor is is continuous and you need to make the bins? here I am assuming factor

here is one approach with plyr

ddply(df,~T,summarise,l=length(unique((A))))
ddply(df,~T,summarise,m=mean(thetadeg))

Upvotes: 0

TheComeOnMan
TheComeOnMan

Reputation: 12905

You should be able to use the following code to get what you want. This doesn't look for 0.1 and 599.5 but that should be easy to manipulate.

timeintervals <- seq(0,600, 30)
for(i in 1:length(timeintervals)
{
  # create the subsets for each time interval
  assign(
    paste0("a",i),
    df[df$T == timeintervals[i],]
    )

  # get all unique As
  assign(
    paste0("b",i),
    unique(df[df$T == timeintervals[i],"A"])
  )

}

Upvotes: 0

zx8754
zx8754

Reputation: 56219

Assuming data is in df data frame then, try this:

sapply(c(0.1,seq(30,599,30),599.5),
       function(x)
         length(unique(df[ df$T==x, "A"])))

Upvotes: 1

Related Questions