Reputation: 13123
I am attempting to number the observations within group and then identify the first and last observation within each group. I know how to do this and have even posted the solution here in response to a past question.
However, I have now stumbled upon a situation in which my solution does not work and I cannot figure out why. Thank you for any advice on how to get the second example below to work. The first example does work. Sorry if I am overlooking a silly typo.
I prefer using base R.
####################################################################
# this works
my.df = read.table(text = '
state county city miles
1 1 1 3
1 1 1 4
1 1 1 4
1 1 1 5
1 1 2 4
1 1 2 3
1 2 1 4
1 2 2 2
1 2 2 4
1 2 2 3
1 2 3 3
1 2 3 2
', header = TRUE)
my.df
str(my.df)
my.seq <- data.frame(rle(my.df$city)$lengths)
my.seq
my.df$first <- unlist(apply(my.seq, 1, function(x) seq(1,x)))
my.df$last <- unlist(apply(my.seq, 1, function(x) seq(x,1,-1)))
my.df
my.df2 <- my.df[my.df$first==1 | my.df$last == 1,]
my.df2
####################################################################
# This does not work. Only the data set has changed.
my.df <- read.table(text = '
state county city miles
40 8 1 12
40 8 1 4
40 8 2 13
40 8 2 3
', header = TRUE)
my.df
str(my.df)
my.seq <- data.frame(rle(my.df$city)$lengths)
my.seq
my.df$first <- unlist(apply(my.seq, 1, function(x) seq(1,x)))
my.df$last <- unlist(apply(my.seq, 1, function(x) seq(x,1,-1)))
my.df
my.df2 <- my.df[my.df$first==1 | my.df$last == 1,]
my.df2
# The expected result with the second example is:
desired.result <- read.table(text = '
state county city miles first last
40 8 1 12 1 2
40 8 1 4 2 1
40 8 2 13 1 2
40 8 2 3 2 1
', header = TRUE)
####################################################################
Upvotes: 2
Views: 159
Reputation: 13123
I figured out how to modify my code so that I get the desired answer with both example data sets.
I simply added as.vector()
around my two unlist(apply())
statements. Here is the code for the second example:
my.df <- read.table(text = '
state county city miles
40 8 1 12
40 8 1 4
40 8 2 13
40 8 2 3
', header = TRUE)
my.df
str(my.df)
my.seq <- data.frame(rle(my.df$city)$lengths)
my.seq
my.df$first <- as.vector(unlist(apply(my.seq, 1, function(x) seq(1,x))))
my.df$last <- as.vector(unlist(apply(my.seq, 1, function(x) seq(x,1,-1))))
my.df
my.df2 <- my.df[my.df$first==1 | my.df$last == 1,]
my.df2
Here is the result:
state county city miles first last
1 40 8 1 12 1 2
2 40 8 1 4 2 1
3 40 8 2 13 1 2
4 40 8 2 3 2 1
Upvotes: 1
Reputation: 121608
Ii was complicated to understand what do you try to do.
I think you get an error because of particular case of unique city !
here how I would do this:
The difficulty here is to create the grouping variable:
xx <- rle(my.df$city)
my.df$group <- rep(seq_along(xx$values),xx$lengths)
Then using ddply
, you take the first and the last of each group:
library(plyr)
res <- ddply(my.df,.(group),function(x){
y <- rbind(head(x,1),tail(x,1))
cbind(y,data.frame(first=c(1,nrow(x)),
last = c(nrow(x),1)))
})
Finally you remove duplicated elements using unique
:
unique(res)
state county city miles group first last
1 40 8 1 12 1 1 2
2 40 8 1 4 1 2 1
3 40 8 2 13 2 1 2
4 40 8 2 3 2 2 1
EDIT base R solution , just replace ddply
by tapply
:
group <- rep(seq_along(xx$values),xx$lengths)
tapply(my.df,group,function(x){
y <- rbind(head(x,1),tail(x,1))
cbind(y,data.frame(first=c(1,nrow(x)),
last = c(nrow(x),1)))
})
unique(res)
Upvotes: 2