Reputation: 961
I have a dataframe (B
) given below which contains some repeated entries for various ids (x_1
). For each id in x_1, I'm interested in extracting rows which have minimum values (as well as their repeats) in the second column (z_1
). For example, for A1
, I would like to extract rows 5 and 6. I would also like to keep rows that don't have more than 1 value under the z_1
column-for example, all other entries apart from A1 and B10 should be included in the ouput.
x_1 <- c("A1", "A1", "A1", "A1", "A1", "A1", "B10", "B10", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100")
z_1 <- c(rep(1.87, 2), rep(1.56, 2), rep(1.15, 2), rep(1.60, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2))
z_2 <- c( c(0.5, 1, 0.5, 1, 0.5, 1), c(0.2, 0.4, 0.2, 0.4), c(0.3, 0.6), c(0.6, 1.2, 1.8), c(0.25, 0.5), c(0.15, 0.3))
Here's the dataframe:
B <- data.frame(x_1, z_1, z_2)
x_1 z_1 z_2
A1 1.87 0.50
A1 1.87 1.00
A1 1.56 0.50
A1 1.56 1.00
A1 1.15 0.50
A1 1.15 1.00
B10 1.60 0.20
B10 1.60 0.40
B10 1.44 0.20
B10 1.44 0.40
B500 1.34 0.30
B500 1.34 0.60
C100 1.50 0.60
C100 1.50 1.20
C100 1.50 1.80
D40 1.90 0.25
D40 1.90 0.50
G100 1.59 0.15
G100 1.59 0.30
Here's the result I would like to obtain:
y_1 d_1 d_2
A1 1.15 0.50
A1 1.15 1.00
B10 1.44 0.20
B10 1.44 0.40
B500 1.34 0.30
B500 1.34 0.60
C100 1.50 0.60
C100 1.50 1.20
C100 1.50 1.80
D40 1.90 0.25
D40 1.90 0.50
G100 1.59 0.15
G100 1.59 0.30
Just to add that I've tried a few things which I found from other similar posted questions but none of these is giving the desired output:
aggregate(grouped_B$x_1, by = list(grouped_B$z_1), min)
do.call("rbind", by(B, B$x_1, function(x) x[which.min(unique(x$z_1)), ]))
Also just to add that the data-frame I'm working on is a grouped data:
grouped_B <- groupedData(z_1 ~ z_2 | x_1,
data = B, FUN = mean,
labels = list( x = "duration",
y = "height"), units = list("(years)"))
I would really appreciate any useful hints/codes.
Upvotes: 0
Views: 94
Reputation: 92282
If each row in your data set is unique, you could do this simply with data.table
library(data.table)
DT <- setDT(B)[, min(z_1), by = c("x_1", "z_2")]
setnames(DT, 1:3, c("y_1", "d_2", "d_1"))
# y_1 d_2 d_1
# 1: A1 0.50 1.15
# 2: A1 1.00 1.15
# 3: B10 0.20 1.44
# 4: B10 0.40 1.44
# 5: B500 0.30 1.34
# 6: B500 0.60 1.34
# 7: C100 0.60 1.50
# 8: C100 1.20 1.50
# 9: C100 1.80 1.50
# 10: D40 0.25 1.90
# 11: D40 0.50 1.90
# 12: G100 0.15 1.59
# 13: G100 0.30 1.59
If it's not unique, could do
DT <- setDT(B)[, list(d_1 = z_1[grep(min(z_1), z_1)],
d_2 = z_2[grep(min(z_1), z_1)]), by = c("x_1")]
# x_1 d_1 d_2
# 1: A1 1.15 0.50
# 2: A1 1.15 1.00
# 3: B10 1.44 0.20
# 4: B10 1.44 0.40
# 5: B500 1.34 0.30
# 6: B500 1.34 0.60
# 7: C100 1.50 0.60
# 8: C100 1.50 1.20
# 9: C100 1.50 1.80
# 10: D40 1.90 0.25
# 11: D40 1.90 0.50
# 12: G100 1.59 0.15
# 13: G100 1.59 0.30
Upvotes: 1
Reputation: 59335
Pretty simple in base R as well:
result <- merge(B,aggregate(z_1~x_1,B,min))
result
# x_1 z_1 z_2
# 1 A1 1.15 0.50
# 2 A1 1.15 1.00
# 3 B10 1.44 0.20
# 4 B10 1.44 0.40
# 5 B500 1.34 0.30
# 6 B500 1.34 0.60
# 7 C100 1.50 0.60
# 8 C100 1.50 1.20
# 9 C100 1.50 1.80
# 10 D40 1.90 0.25
# 11 D40 1.90 0.50
# 12 G100 1.59 0.15
# 13 G100 1.59 0.30
Upvotes: 1