Ewa Karolina P
Ewa Karolina P

Reputation: 11

How can I subset my data into intervals

I'm new to R and I'm trying to get my script more efficient. I have a data.frame of 25480 observations and 17 variables.

One of my variables is Subject and each subject has its number. However, the number of observations (lines) for each subject is not equal. I would like to separate my subjects into groups, according to their number. How can I do it?

Before I used this formula:

gaze <- subset(gaze, Subject != "261" & Subject != "270" & Subject != "275") 

But now I have too many subjects to repeat Subject each time. Is it possible to define interval of subjects to cut or to split. I tried this command but it doesn’t seem to work:

gazeS <- (gaze$Subject[112:216])
cut(gaze, seq(gaze, from = 112, to = 116))

Could you help me to fix this code, please?

Upvotes: 1

Views: 7115

Answers (2)

IRTFM
IRTFM

Reputation: 263301

Since there is no ordering method for factor variables (even if they appear numeric) you need to convert first for any ordering operation to work and the R-FAQ says to use :

as.numeric(as.character(fac))

So:

subset(gaze, !as.numeric(as.character(Subject)) in 260:280)

Or:

subset(gaze, !( as.numeric(as.character(Subject)) >= 260 &
            as.numeric(as.character(Subject)) <= 280)  )

Or:

subset( gaze, !Subject %in% as.character(260:280) )

Upvotes: 1

nico
nico

Reputation: 51640

If I correctly understand what you need, you could use something like

gaze$Subject <- as.integer(as.charachter(gaze$Subject))
gaze <- subset(gaze, Subject >= 261 & Subject <= 280) 

It is important to cast the id as character otherwise funny things may happen with factor levels being ordered alphabetically and not numerically. The best thing to avoid this, however, is to directly set column classes when reading the data (e.g. with the colClasses parameter of read.table).

Upvotes: 0

Related Questions