Reputation: 11
I'm new to R
and I'm trying to get my script more efficient. I have a data.frame
of 25480
observations and 17
variables.
One of my variables is Subject
and each subject has its number. However, the number of observations (lines
) for each subject is not equal. I would like to separate my subjects
into groups
, according to their number. How can I do it?
Before I used this formula:
gaze <- subset(gaze, Subject != "261" & Subject != "270" & Subject != "275")
But now I have too many subjects to repeat Subject each time. Is it possible to define interval of subjects to cut
or to split
. I tried this command but it doesn’t seem to work:
gazeS <- (gaze$Subject[112:216])
cut(gaze, seq(gaze, from = 112, to = 116))
Could you help me to fix this code, please?
Upvotes: 1
Views: 7115
Reputation: 263301
Since there is no ordering method for factor variables (even if they appear numeric) you need to convert first for any ordering operation to work and the R-FAQ says to use :
as.numeric(as.character(fac))
So:
subset(gaze, !as.numeric(as.character(Subject)) in 260:280)
Or:
subset(gaze, !( as.numeric(as.character(Subject)) >= 260 &
as.numeric(as.character(Subject)) <= 280) )
Or:
subset( gaze, !Subject %in% as.character(260:280) )
Upvotes: 1
Reputation: 51640
If I correctly understand what you need, you could use something like
gaze$Subject <- as.integer(as.charachter(gaze$Subject))
gaze <- subset(gaze, Subject >= 261 & Subject <= 280)
It is important to cast the id as character otherwise funny things may happen with factor levels being ordered alphabetically and not numerically. The best thing to avoid this, however, is to directly set column classes when reading the data (e.g. with the colClasses
parameter of read.table
).
Upvotes: 0