Reputation: 23
Each "KCL_ID" has several different "DAYS",KCL_ID = c(21L, 21L, 21L, 22L, 22L, 22L......), DAYS = c(1449, 1814, 582, 582, 947, 183......)
what I want to do is, to use aggregate , and get the smallest "DAYS" for each"KCL_ID", and list "BMI"BMI = c(26.4, 28.7, 32, 25.3, 25.2, 25.7......)
which is correspond to the smallest "DAYS" correctly
So, what I want to get is:
KCL_ID DAYS BMI
1 21 582 32.0
2 22 183 25.7
3 61 0 21.0
4 62 0 38.5
5 71 109 27.2
6 72 109 22.6
But when I use this code:
aggregate(DAYS~KCL_ID+BMI, data = A, min)
I get following result:
KCL_ID BMI DAYS
1 61 20.1 399
2 61 21.0 0
3 72 22.6 109
4 72 23.9 474
5 62 24.2 1461
6 61 25.0 365
7 22 25.2 947
8 22 25.3 582
9 22 25.7 183
10 61 25.8 731
11 62 26.3 1096
12 21 26.4 1449
13 61 26.4 1096
14 61 26.8 1461
15 62 27.2 1826
16 71 27.2 109
17 62 27.5 2192
18 71 28.6 474
19 21 28.7 1814
20 71 29.2 840
21 61 29.5 2192
22 61 29.7 1826
23 21 32.0 582
24 62 34.8 731
25 62 37.0 365
26 62 38.5 0
Does anyone know how to solve this question?
Upvotes: 2
Views: 50
Reputation: 1343
merge(aggregate(DAYS ~ KCL_ID, A, min),
A)
Explanation
use aggregate
to get the min
of DAYS
, grouped only by KCL_ID
.
use merge
to add back the original columns that match this subset. (It defaults to an inner join, which works in this case, because all rows in the subset are in the original and you don't want to add rows of the original that do not match the subset. Also, by default merge
joins by overlapping column names, which is perfect).
Upvotes: 1
Reputation: 20095
OP is pretty close to the solution. The formula should be kept as DAYS ~ KCL_ID
in aggregate
. And then merge on both DAYS
and KCL_ID
columns with original dataframe.
merge(df, aggregate(DAYS ~ KCL_ID, data = df, min), by = c("DAYS", "KCL_ID"))
# DAYS KCL_ID BMI
# 1 183 22 25.7
# 2 582 21 32.0
Data
df <- data.frame(KCL_ID = c(21L, 21L, 21L, 22L, 22L, 22L),
DAYS = c(1449, 1814, 582, 582, 947, 183),
BMI = c(26.4, 28.7, 32, 25.3, 25.2, 25.7))
Upvotes: 0
Reputation: 179
The Dplyr Package in R is pretty useful for this one imo.
library(dplyr)
dfx <- data.frame(
KCL_ID = c(21L, 21L, 21L, 22L, 22L, 22L),
DAYS = c(1449, 1814, 582, 582, 947, 183),
BMI = c(26.4, 28.7, 32, 25.3, 25.2, 25.7)
)
dfx %>% #read the %>% as "then do"
group_by(KCL_ID)%>%
summarise( smallestDays = min(DAYS))%>% # find the smallest days
left_join(dfx, by = c("smallestDays" ="DAYS", "KCL_ID" = "KCL_ID"))%>% as.data.frame() #Joining on both columns should give you what you need
Is this what you want?
KCL_ID smallestDays BMI
1 21 582 32.0
2 22 183 25.7
Upvotes: 0