Tseng
Tseng

Reputation: 23

About aggregate in R

Each "KCL_ID" has several different "DAYS",KCL_ID = c(21L, 21L, 21L, 22L, 22L, 22L......), DAYS = c(1449, 1814, 582, 582, 947, 183......) what I want to do is, to use aggregate , and get the smallest "DAYS" for each"KCL_ID", and list "BMI"BMI = c(26.4, 28.7, 32, 25.3, 25.2, 25.7......) which is correspond to the smallest "DAYS" correctly So, what I want to get is:

   KCL_ID DAYS BMI 
1     21  582  32.0  
2     22  183  25.7
3     61    0  21.0
4     62    0  38.5
5     71  109  27.2
6     72  109  22.6

But when I use this code:

aggregate(DAYS~KCL_ID+BMI, data = A, min)

I get following result:

   KCL_ID  BMI DAYS
1      61 20.1  399
2      61 21.0    0
3      72 22.6  109
4      72 23.9  474
5      62 24.2 1461
6      61 25.0  365
7      22 25.2  947
8      22 25.3  582
9      22 25.7  183
10     61 25.8  731
11     62 26.3 1096
12     21 26.4 1449
13     61 26.4 1096
14     61 26.8 1461
15     62 27.2 1826
16     71 27.2  109
17     62 27.5 2192
18     71 28.6  474
19     21 28.7 1814
20     71 29.2  840
21     61 29.5 2192
22     61 29.7 1826
23     21 32.0  582
24     62 34.8  731
25     62 37.0  365
26     62 38.5    0

Does anyone know how to solve this question?

Upvotes: 2

Views: 50

Answers (3)

Jordi
Jordi

Reputation: 1343

merge(aggregate(DAYS ~ KCL_ID, A, min), 
      A)

Explanation

  • use aggregate to get the min of DAYS, grouped only by KCL_ID.

  • use merge to add back the original columns that match this subset. (It defaults to an inner join, which works in this case, because all rows in the subset are in the original and you don't want to add rows of the original that do not match the subset. Also, by default merge joins by overlapping column names, which is perfect).

Upvotes: 1

MKR
MKR

Reputation: 20095

OP is pretty close to the solution. The formula should be kept as DAYS ~ KCL_ID in aggregate. And then merge on both DAYS and KCL_ID columns with original dataframe.

merge(df, aggregate(DAYS ~ KCL_ID, data = df, min), by = c("DAYS", "KCL_ID"))
#   DAYS KCL_ID  BMI
# 1  183     22 25.7
# 2  582     21 32.0

Data

df <- data.frame(KCL_ID = c(21L, 21L, 21L, 22L, 22L, 22L),
DAYS = c(1449, 1814, 582, 582, 947, 183),
BMI = c(26.4, 28.7, 32, 25.3, 25.2, 25.7))

Upvotes: 0

Anurag Kaushik
Anurag Kaushik

Reputation: 179

The Dplyr Package in R is pretty useful for this one imo.

library(dplyr)

dfx <- data.frame(
KCL_ID = c(21L, 21L, 21L, 22L, 22L, 22L),
DAYS = c(1449, 1814, 582, 582, 947, 183),
BMI = c(26.4, 28.7, 32, 25.3, 25.2, 25.7)
)

dfx %>% #read the %>% as "then do"
  group_by(KCL_ID)%>% 
  summarise( smallestDays = min(DAYS))%>% # find the smallest days
  left_join(dfx, by = c("smallestDays" ="DAYS", "KCL_ID" = "KCL_ID"))%>% as.data.frame() #Joining on both columns should give you what you need

Is this what you want?

    KCL_ID smallestDays  BMI
1     21          582     32.0
2     22          183     25.7

Upvotes: 0

Related Questions