user2647568
user2647568

Reputation: 45

Creating a new columns from a data.frame

I have a dataset which is in longformat in which Measurements (Time) are nested in Networkpartners (NP) which are nested in Persons (ID), here is an example of what it looks like (the real dataset has over thousands of rows):

ID  NP  Time Outcome
1   11  1    4
1   11  2    3
1   11  3    NA
1   12  1    2
1   12  2    3
1   12  3    3
2   21  1    2
2   21  2    NA
2   21  3    NA
2   22  1    4
2   22  2    4
2   22  3    4

Now I would like to create 3 new variables:

a) The Number of Networkpartners (who have no NA in the outcome at this measurement) a specific person (ID) has Time 1

b) Number of Networkpartners (who have no NA in the outcome at this measurement) a specific person (ID) at Time 2

c) Number of Networkpartners (who have no NA in the outcome at this measurement) a specific person (ID) at Time 3

So I would like to create a dataset like this:

ID  NP  Time Outcome  NP.T1  NP.T2  NP.T3
1   11  1    4        2      2      1
1   11  2    3        2      2      1
1   11  3    NA       2      2      1
1   12  1    2        2      2      1
1   12  2    3        2      2      1
1   12  3    3        2      2      1
2   21  1    2        2      1      1
2   21  2    NA       2      1      1
2   21  3    NA       2      1      1
2   22  1    4        2      1      1
2   22  2    4        2      1      1
2   22  3    4        2      1      1

I would really appreciate your help.

Upvotes: 0

Views: 185

Answers (1)

Metrics
Metrics

Reputation: 15458

You can just create one variable rather than three. I am using ddply from plyr package for that.

mydata<-structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L), NP = c(11L, 11L, 11L, 12L, 12L, 12L, 21L, 21L, 21L, 
22L, 22L, 22L), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
1L, 2L, 3L), Outcome = c(4L, 3L, NA, 2L, 3L, 3L, 2L, NA, NA, 
4L, 4L, 4L)), .Names = c("ID", "NP", "Time", "Outcome"), class = "data.frame", row.names = c(NA, 
-12L))


    library(plyr)
    mydata1<-ddply(mydata,.(ID,Time),transform, NP.T=length(Outcome[which(Outcome !="NA")]))
>mydata1
   ID NP Time Outcome NP.T
1   1 11    1       4    2
2   1 12    1       2    2
3   1 11    2       3    2
4   1 12    2       3    2
5   1 11    3      NA    1
6   1 12    3       3    1
7   2 21    1       2    2
8   2 22    1       4    2
9   2 21    2      NA    1
10  2 22    2       4    1
11  2 21    3      NA    1
12  2 22    3       4    1

Updated: You can also use interaction to create the unique variable that combines ID and Time (comb)

mydata1<-ddply(mydata,.(ID,Time),transform, NP.T=length(Outcome[which(Outcome !="NA")]),comb=interaction(ID,Time))

Upvotes: 2

Related Questions