Extracting rows from selected column in repeated/grouped data

Question

I have a dataframe (B) given below which contains some repeated entries for various ids (x_1). For each id in x_1, I'm interested in extracting rows which have minimum values (as well as their repeats) in the second column (z_1). For example, for A1, I would like to extract rows 5 and 6. I would also like to keep rows that don't have more than 1 value under the z_1 column-for example, all other entries apart from A1 and B10 should be included in the ouput.

x_1 <- c("A1", "A1", "A1", "A1", "A1", "A1", "B10", "B10", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100")
z_1 <- c(rep(1.87, 2), rep(1.56, 2), rep(1.15, 2), rep(1.60, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2))
z_2 <- c( c(0.5, 1, 0.5, 1, 0.5, 1), c(0.2, 0.4, 0.2, 0.4), c(0.3, 0.6), c(0.6, 1.2, 1.8), c(0.25, 0.5), c(0.15, 0.3))

Here's the dataframe:

B <- data.frame(x_1, z_1, z_2)

x_1  z_1  z_2
 A1  1.87 0.50
 A1  1.87 1.00
 A1  1.56 0.50
 A1  1.56 1.00
 A1  1.15 0.50
 A1  1.15 1.00
B10  1.60 0.20
B10  1.60 0.40
B10  1.44 0.20
B10  1.44 0.40
B500 1.34 0.30
B500 1.34 0.60
C100 1.50 0.60
C100 1.50 1.20
C100 1.50 1.80
D40  1.90 0.25
D40  1.90 0.50
G100 1.59 0.15
G100 1.59 0.30

Here's the result I would like to obtain:

y_1  d_1  d_2
A1   1.15 0.50
A1   1.15 1.00
B10  1.44 0.20
B10  1.44 0.40
B500 1.34 0.30
B500 1.34 0.60
C100 1.50 0.60
C100 1.50 1.20
C100 1.50 1.80
D40  1.90 0.25
D40  1.90 0.50
G100 1.59 0.15
G100 1.59 0.30

Just to add that I've tried a few things which I found from other similar posted questions but none of these is giving the desired output:

aggregate(grouped_B$x_1, by = list(grouped_B$z_1), min)
do.call("rbind", by(B, B$x_1, function(x) x[which.min(unique(x$z_1)), ]))

Also just to add that the data-frame I'm working on is a grouped data:

grouped_B <- groupedData(z_1 ~ z_2 | x_1,      
                         data = B, FUN = mean,
                         labels = list( x = "duration",
                         y = "height"), units = list("(years)"))

I would really appreciate any useful hints/codes.

jlhoward · Accepted Answer

Pretty simple in base R as well:

result <- merge(B,aggregate(z_1~x_1,B,min))
result
#     x_1  z_1  z_2
# 1    A1 1.15 0.50
# 2    A1 1.15 1.00
# 3   B10 1.44 0.20
# 4   B10 1.44 0.40
# 5  B500 1.34 0.30
# 6  B500 1.34 0.60
# 7  C100 1.50 0.60
# 8  C100 1.50 1.20
# 9  C100 1.50 1.80
# 10  D40 1.90 0.25
# 11  D40 1.90 0.50
# 12 G100 1.59 0.15
# 13 G100 1.59 0.30

Extracting rows from selected column in repeated/grouped data

Answers (2)

Related Questions