Reputation: 1
df=
ID Order_nr C D
1 1 N87.0 N87.0
2 1 N87.1 N87.1
3 1 N87.1 N87.1
4 1 N87.1 N87.1
4 2 N87.0 N87.1
5 1 D06 D06
6 1 N87.0 N87.0
7 1 N87.1 N87.1
7 2 N87.1 N87.1
7 3 N87.0 N87.1
7 4 N87.0 N87.1
7 5 N87.0 N87.1
7 6 N87.0 N87.1
8 1 N87.0 N87.0
For better Pic :
I have to create the column D, which is uniqly set for every ID using the Order_nr and C.
I have do something like this df$D = df$C[Order_nr == 1]
ID 1 only appeares once so there isn't much to choose from, but ID 7 appeares 6 times and I need to add N87.1 to all of those 6 lines since df$C[Order_nr == 1] => N87.1
I have tried to do this in numerous ways and failed. So far I have managed to do something close to it using double for loops, but that wasn't perfect or needed anyways.
Example of what I'm set with right now:
foo <- function(df) {
C = df$C[df$Order_nr == 1] }
ddply( df, .(ID),mutate, foo)
That doesn't seem to do anything though. Could someone point me in the right direction.
On side note. Is there a specific way to refer the the different subsets that ddply creates and later puts together into 1 data.frame. Lets say that there are 10 different ID's and there is 5 to 10 of each ID. If i used ddply(df,.(ID),...), then how do I refer the the subset that has only ID = 1, 2, ...
EDIT Metrics code did the magic by applying the head() function
ddply(df1,.(ID),transform,E=head(C,1))
Upvotes: 0
Views: 297
Reputation: 36076
In terms of using ddply
to assign a value for each row with mutate
, this is how I would have approached it. I name the new column D2
so I could compare it to your column D
.
ddply(df, .(ID), mutate, D2 = C[Order_nr == 1])
I think some of the trouble you were having has to do with your function foo
. That function expects you to give it a data.frame, but when you use ddply
with mutate
you will be working with columns within the data.frame. I'm still looking a ddply
option to that uses your original function, but I'm not sure if it will work out.
Edit
To follow up on your function foo
, the first problem you had is it didn't return anything. I always have to check my functions on a simple example to make sure they are doing what I want them to do. Notice
foo(df[df$ID == 7,])
doesn't return an answer, which is a red flag that something is wrong.
I ended up changing you function to
foo = function(df) {
C = as.character(df$C[df$Order_nr == 1])
C
}
You could use this with ddply
without mutate
, which expects a function for the entire data.frame. However, you'd have to combine this result with the merge
answer from @RichieCotton. I'd stick to using the column names as in my example above.
ddply(df, .(ID), foo)
Upvotes: 3
Reputation: 15458
Assuming that Order_no is already sorted before applying ddply
and there is Order_nr 1 for all
library(plyr)
ddply(df1,.(ID),transform,E=head(C,1))
ID Order_nr C D E
1 1 1 N87.0 N87.0 N87.0
2 2 1 N87.1 N87.1 N87.1
3 3 1 N87.1 N87.1 N87.1
4 4 1 N87.1 N87.1 N87.1
5 4 2 N87.0 N87.1 N87.1
6 5 1 D06 D06 D06
7 6 1 N87.0 N87.0 N87.0
8 7 1 N87.1 N87.1 N87.1
9 7 2 N87.1 N87.1 N87.1
10 7 3 N87.0 N87.1 N87.1
11 7 4 N87.0 N87.1 N87.1
12 7 5 N87.0 N87.1 N87.1
13 7 6 N87.0 N87.1 N87.1
14 8 1 N87.0 N87.0 N87.0
Upvotes: 2
Reputation: 121057
You don't need ddply
, you need merge
.
A reproducible dataset:
n_groups <- 8
n_reps <- sample(6, n_groups, replace = TRUE)
df <- data.frame(
ID = rep(seq_len(n_groups), n_reps),
Order_nr = unlist(lapply(n_reps, seq_len)),
C = sample(letters, sum(n_reps), replace = TRUE)
)
Create a lookup table of the ID and the group.
lookup <- subset(df, Order_nr == 1, c(ID, C))
colnames(lookup) <- c("ID", "D")
Now merge on the ID column.
merge(df, lookup, by = "ID")
Upvotes: 2