Karl Räis
Karl Räis

Reputation: 1

R: ddply() guidance needed

df=
ID  Order_nr    C             D
1   1     N87.0     N87.0
2   1     N87.1         N87.1
3   1     N87.1         N87.1   
4   1     N87.1     N87.1
4   2     N87.0     N87.1
5   1     D06       D06
6   1     N87.0     N87.0
7   1     N87.1     N87.1
7   2     N87.1     N87.1
7   3     N87.0     N87.1
7   4     N87.0     N87.1
7   5     N87.0     N87.1
7   6     N87.0     N87.1
8   1     N87.0     N87.0

For better Pic :

enter image description here

I have to create the column D, which is uniqly set for every ID using the Order_nr and C. I have do something like this df$D = df$C[Order_nr == 1] ID 1 only appeares once so there isn't much to choose from, but ID 7 appeares 6 times and I need to add N87.1 to all of those 6 lines since df$C[Order_nr == 1] => N87.1

I have tried to do this in numerous ways and failed. So far I have managed to do something close to it using double for loops, but that wasn't perfect or needed anyways.

Example of what I'm set with right now:

foo <- function(df) {
  C = df$C[df$Order_nr == 1] }
ddply( df, .(ID),mutate, foo)

That doesn't seem to do anything though. Could someone point me in the right direction.

On side note. Is there a specific way to refer the the different subsets that ddply creates and later puts together into 1 data.frame. Lets say that there are 10 different ID's and there is 5 to 10 of each ID. If i used ddply(df,.(ID),...), then how do I refer the the subset that has only ID = 1, 2, ...

EDIT Metrics code did the magic by applying the head() function

ddply(df1,.(ID),transform,E=head(C,1))

Upvotes: 0

Views: 297

Answers (3)

aosmith
aosmith

Reputation: 36076

In terms of using ddply to assign a value for each row with mutate, this is how I would have approached it. I name the new column D2 so I could compare it to your column D.

ddply(df, .(ID), mutate, D2 = C[Order_nr == 1])

I think some of the trouble you were having has to do with your function foo. That function expects you to give it a data.frame, but when you use ddply with mutate you will be working with columns within the data.frame. I'm still looking a ddply option to that uses your original function, but I'm not sure if it will work out.

Edit

To follow up on your function foo, the first problem you had is it didn't return anything. I always have to check my functions on a simple example to make sure they are doing what I want them to do. Notice

foo(df[df$ID == 7,])

doesn't return an answer, which is a red flag that something is wrong.

I ended up changing you function to

foo = function(df) {
  C = as.character(df$C[df$Order_nr == 1])
  C
}

You could use this with ddply without mutate, which expects a function for the entire data.frame. However, you'd have to combine this result with the merge answer from @RichieCotton. I'd stick to using the column names as in my example above.

ddply(df, .(ID), foo)

Upvotes: 3

Metrics
Metrics

Reputation: 15458

Assuming that Order_no is already sorted before applying ddply and there is Order_nr 1 for all

library(plyr)
ddply(df1,.(ID),transform,E=head(C,1))
   ID Order_nr     C     D     E
1   1        1 N87.0 N87.0 N87.0
2   2        1 N87.1 N87.1 N87.1
3   3        1 N87.1 N87.1 N87.1
4   4        1 N87.1 N87.1 N87.1
5   4        2 N87.0 N87.1 N87.1
6   5        1   D06   D06   D06
7   6        1 N87.0 N87.0 N87.0
8   7        1 N87.1 N87.1 N87.1
9   7        2 N87.1 N87.1 N87.1
10  7        3 N87.0 N87.1 N87.1
11  7        4 N87.0 N87.1 N87.1
12  7        5 N87.0 N87.1 N87.1
13  7        6 N87.0 N87.1 N87.1
14  8        1 N87.0 N87.0 N87.0

Upvotes: 2

Richie Cotton
Richie Cotton

Reputation: 121057

You don't need ddply, you need merge.

A reproducible dataset:

n_groups <- 8
n_reps <- sample(6, n_groups, replace = TRUE)
df <- data.frame(
  ID       = rep(seq_len(n_groups), n_reps),
  Order_nr = unlist(lapply(n_reps, seq_len)),
  C        = sample(letters, sum(n_reps), replace = TRUE)
)

Create a lookup table of the ID and the group.

lookup <- subset(df, Order_nr == 1, c(ID, C))
colnames(lookup) <- c("ID", "D")

Now merge on the ID column.

merge(df, lookup, by = "ID")

Upvotes: 2

Related Questions