Reputation: 963
Here is my data:
name = c(letters[1:10], letters[1:10])
Yr <- rep(1:2, each = length (name))
P1 = c(NA, NA,NA, NA, "a", "a", "a", "d", "d", "d", NA, NA,NA,
NA, "a", "a", "a", "d", "d", "d")
P2 = c(NA, NA,NA, NA, "b", "b", "b", "b", "b", "c",
NA, NA,NA, NA, "b", "b", "b", "b", "b", "c")
myd <- data.frame (Yr, name, P1, P2)
For all unique value except in P1 and P2, I would like to create new variables named a and b, then score their presence or absence in P1 and P2 column. Just like the following:
Yr name P1 P2 a b c d
1 a <NA> <NA> NA NA NA NA
1 b <NA> <NA> NA NA NA NA
1 c <NA> <NA> NA NA NA NA
1 d <NA> <NA> NA NA NA NA
1 e a b 1 1 0 0
1 f a b 1 1 0 0
1 g a b 1 1 0 0
1 h d b 0 1 0 1
1 i d b 0 1 0 1
1 j d c 0 0 1 1
(1) new columns - The new columns are unique value in P1 and P2 column (except missing value)
(2) Values for new columns - For example, for case of "a" name (subject) both P1 and P2 is so NA. for case of "e", name P1 is "a", and P2 is "b", so "a" and "b", will get 1, everything else is 0, similarly based on presence of P1 or P2, the columns a, b, c,d are scored.
The full expected output:
Yr name P1 P2 a b c d
1 a <NA> <NA> NA NA NA NA
1 b <NA> <NA> NA NA NA NA
1 c <NA> <NA> NA NA NA NA
1 d <NA> <NA> NA NA NA NA
1 e a b 1 1 0 0
1 f a b 1 1 0 0
1 g a b 1 1 0 0
1 h d b 0 1 0 1
1 i d b 0 1 0 1
1 j d c 0 0 1 1
1 a <NA> <NA> NA NA NA NA
1 b <NA> <NA> NA NA NA NA
1 c <NA> <NA> NA NA NA NA
1 d <NA> <NA> NA NA NA NA
1 e a b 1 1 0 0
1 f a b 1 1 0 0
1 g a b 1 1 0 0
1 h d b 0 1 0 1
1 i d b 0 1 0 1
1 j d c 0 0 1 1
2 a <NA> <NA> NA NA NA NA
2 b <NA> <NA> NA NA NA NA
2 c <NA> <NA> NA NA NA NA
2 d <NA> <NA> NA NA NA NA
2 e a b 1 1 0 0
2 f a b 1 1 0 0
2 g a b 1 1 0 0
2 h d b 0 1 0 1
2 i d b 0 1 0 1
2 j d c 0 0 1 1
2 a <NA> <NA> NA NA NA NA
2 b <NA> <NA> NA NA NA NA
2 c <NA> <NA> NA NA NA NA
2 d <NA> <NA> NA NA NA NA
2 e a b 1 1 0 0
2 f a b 1 1 0 0
2 g a b 1 1 0 0
2 h d b 0 1 0 1
2 i d b 0 1 0 1
2 j d c 0 0 1 1
Upvotes: 2
Views: 83
Reputation: 118799
Here's a data.table
solution:
require(data.table)
dt <- data.table(myd)
dt[, grp := gl(4, 10)]
dt[, list(a = sum(unique(P1) == "a"),
b = sum(unique(P1) == "d"),
c = sum(unique(P2) == "b"),
d = sum(unique(P2) == "c")),
by=list(grp, Yr, name)]
# grp Yr name a b c d
# 1: 1 1 a NA NA NA NA
# 2: 1 1 b NA NA NA NA
# 3: 1 1 c NA NA NA NA
# 4: 1 1 d NA NA NA NA
# 5: 1 1 e 1 0 1 0
# 6: 1 1 f 1 0 1 0
# 7: 1 1 g 1 0 1 0
# 8: 1 1 h 0 1 1 0
# 9: 1 1 i 0 1 1 0
# 10: 1 1 j 0 1 0 1
# 11: 2 1 a NA NA NA NA
# 12: 2 1 b NA NA NA NA
# 13: 2 1 c NA NA NA NA
# 14: 2 1 d NA NA NA NA
# 15: 2 1 e 1 0 1 0
# 16: 2 1 f 1 0 1 0
# 17: 2 1 g 1 0 1 0
# 18: 2 1 h 0 1 1 0
# 19: 2 1 i 0 1 1 0
# 20: 2 1 j 0 1 0 1
# 21: 3 2 a NA NA NA NA
# 22: 3 2 b NA NA NA NA
# 23: 3 2 c NA NA NA NA
# 24: 3 2 d NA NA NA NA
# 25: 3 2 e 1 0 1 0
# 26: 3 2 f 1 0 1 0
# 27: 3 2 g 1 0 1 0
# 28: 3 2 h 0 1 1 0
# 29: 3 2 i 0 1 1 0
# 30: 3 2 j 0 1 0 1
# 31: 4 2 a NA NA NA NA
# 32: 4 2 b NA NA NA NA
# 33: 4 2 c NA NA NA NA
# 34: 4 2 d NA NA NA NA
# 35: 4 2 e 1 0 1 0
# 36: 4 2 f 1 0 1 0
# 37: 4 2 g 1 0 1 0
# 38: 4 2 h 0 1 1 0
# 39: 4 2 i 0 1 1 0
# 40: 4 2 j 0 1 0 1
# grp Yr name a b c d
Upvotes: 2
Reputation: 93813
Here's an attempt using a double merge
and model.matrix
:
result <- merge(
merge(myd,model.matrix(~ P1 - 1, data=myd),by="row.names",all.x=TRUE),
model.matrix(~ P2 - 1, data=myd),
by.x="Row.names",by.y="row.names",all.x=TRUE
)
# put the data in the 'right' order again and rename the columns
result <- result[order(as.numeric(result$Row.names)),]
names(result)[6:9] <- gsub("P1|P2","",names(result)[6:9])
Which gives a final product like this, though the column names are not quite listed in the order specified.
> result
Row.names Yr name P1 P2 a d b c
1 1 1 a <NA> <NA> NA NA NA NA
12 2 1 b <NA> <NA> NA NA NA NA
23 3 1 c <NA> <NA> NA NA NA NA
34 4 1 d <NA> <NA> NA NA NA NA
36 5 1 e a b 1 0 1 0
37 6 1 f a b 1 0 1 0
38 7 1 g a b 1 0 1 0
39 8 1 h d b 0 1 1 0
Upvotes: 3
Reputation: 13363
newcolname <- with(myd,sort(union(levels(P1),levels(P2))))
newcollist <- with(myd,lapply(newcolname,function(x) as.numeric(P1==x|P2==x)))
names(newcollist) <- newcolname
newcoldf <- data.frame(newcollist)
myd <- cbind(myd,newcoldf)
Upvotes: 2