Reputation: 17
I am new to R and I am trying to get the minimum Distance value and the corresponding "Record2_ID" value for every unique "Record1_ID" value for the below dataframe
Record1_ID Record2_ID Distance
6 10_Bil 0.95337476
6 11_Bla 0.852558044
6 12_Bon 1
6 13_Bra 1
684 78_Lip 0.957437173
684 79_Lip 1
684 80_Liv 0.950852681
684 81_Lun 0.914874347
3065 136_Pri 1
3065 137_Pro 0.895742793
3065 138_Rec 0.895742793
3065 139_Ren 0.934061953
I used the function tapply(x$Distance_Cosine, cosine_dist_type_data$Record1_rowID, min)
, but using tapply
I am not getting "Record2_rowID" values. Ideally the output should be
Record1_ID Record2_ID Min_Distance
6 11_Bla 0.852558044
684 81_Lun 0.914874347
3065 137_Pro 0.895742793
Can this be done using sapply
or any other function. Thanks for the help
Upvotes: 0
Views: 1017
Reputation: 3656
library(data.table)
df = data.table(read.table(header = T, text = "
Record1_ID Record2_ID Distance
6 10_Bil 0.95337476
6 11_Bla 0.852558044
6 12_Bon 1
6 13_Bra 1
684 78_Lip 0.957437173
684 79_Lip 1
684 80_Liv 0.950852681
684 81_Lun 0.914874347
3065 136_Pri 1
3065 137_Pro 0.895742793
3065 138_Rec 0.895742793
3065 139_Ren 0.934061953
"))
df[, Min_Distance := min(Distance), by = Record1_ID]
df[Distance == Min_Distance,]
Or slightly more straightforward:
df[, .SD[Distance == min(Distance)], by=Record1_ID]
.SD
contains the S
ubset of D
ata for each group. We just select the rows we want on that subset corresponding to min(Distance)
.
Upvotes: 2
Reputation: 6449
or without plyr:
blah <- lapply(split(df, df["Record1_ID"]), function(x) x[which.min(x$Distance),])
min_vals.df <- do.call(rbind, blah)
blah <- lapply(split(df, df["Record1_ID"]), function(x) subset(x, Distance==min(Distance)))
min_vals.df <- do.call(rbind, blah)
Upvotes: 1
Reputation: 145755
Or with dplyr
:
require(dplyr)
df %.% group_by(Record1_ID) %.% filter(Distance == min(Distance))
Upvotes: 0
Reputation: 67778
Or you may use the base
function ave
df[df$Distance == ave(df$Distance, df$Record1_ID, FUN = min), ]
# Record1_ID Record2_ID Distance
# 2 6 11_Bla 0.8525580
# 8 684 81_Lun 0.9148743
# 10 3065 137_Pro 0.8957428
# 11 3065 138_Rec 0.8957428
Upvotes: 2
Reputation:
If that's a dataframe, you want to look at plyr, specifically the ddply function. Not tremendously elegant, but try...
min_vals.df <- ddply(.data = df,
.variables = "Record1_ID",
.fun = function(x){
return(x[x$Distance == min(x$Distance),c("Record2_ID","Distance")])
Plyr and it's successor, dplyr, are "apply for data frames", iterating through each unique permutation of .variables and performing whatever function you want on the resulting data.
Upvotes: 1