Reputation: 2412
I have a data frame that looks like:
Name Value1 Value2 Value3
sample1 ttn mth lik
sample2 bae ttn.1 apk
sample3 pas kasd mth
dat <- structure(list(Name = c("sample1", "sample2", "sample3"), Value1 = c("ttn",
"bae", "pas"), Value2 = c("mth", "ttn.1", "kasd"), Value3 = c("lik",
"apk", "mth")), .Names = c("Name", "Value1", "Value2", "Value3"
), row.names = c(NA, -3L), class = "data.frame")
I would like to rearrange and count frequency so it would look like:
Value Source1 Source2
ttn sample1
mth sample1 sample3
lik sample1
How do I do this?
Upvotes: 0
Views: 353
Reputation: 193517
These solutions don't get you exactly where you want to be, but might be close enough for you to work from there.
First, some data:
temp <- structure(list(Name = c("sample1", "sample2", "sample3"),
Value1 = c("ttn", "bae", "pas"),
Value2 = c("mth", "ttn.1", "kasd"),
Value3 = c("lik", "apk", "mth")),
.Names = c("Name", "Value1", "Value2", "Value3"),
class = "data.frame", row.names = c(NA, -3L))
temp
# Name Value1 Value2 Value3
# 1 sample1 ttn mth lik
# 2 sample2 bae ttn.1 apk
# 3 sample3 pas kasd mth
These data are in "wide" form. Use reshape()
to get it into "long" form.
temp1 <- reshape(temp, direction = "long",
idvar="Name", varying = 2:4, sep = "")
# Name time Value
# sample1.1 sample1 1 ttn
# sample2.1 sample2 1 bae
# sample3.1 sample3 1 pas
# sample1.2 sample1 2 mth
# sample2.2 sample2 2 ttn.1
# sample3.2 sample3 2 kasd
# sample1.3 sample1 3 lik
# sample2.3 sample2 3 apk
# sample3.3 sample3 3 mth
Now, use aggregate()
from base R or dcast()
from the "reshape2" package to aggregate based on the "value" values.
aggregate(Name ~ Value, temp1, c)
# Value Name
# 1 apk sample2
# 2 bae sample2
# 3 kasd sample3
# 4 lik sample1
# 5 mth sample1, sample3
# 6 pas sample3
# 7 ttn sample1
# 8 ttn.1 sample2
require(reshape2)
dcast(temp1, Value ~ Name, value.var = "Value")
# Value sample1 sample2 sample3
# 1 apk <NA> apk <NA>
# 2 bae <NA> bae <NA>
# 3 kasd <NA> <NA> kasd
# 4 lik lik <NA> <NA>
# 5 mth mth <NA> mth
# 6 pas <NA> <NA> pas
# 7 ttn ttn <NA> <NA>
# 8 ttn.1 <NA> ttn.1 <NA>
You also mentioned that you would like to "count frequency", in which case, table()
might also be appropriate:
table(temp1$Value, temp1$Name)
#
# sample1 sample2 sample3
# apk 0 1 0
# bae 0 1 0
# kasd 0 0 1
# lik 1 0 0
# mth 1 0 1
# pas 0 0 1
# ttn 1 0 0
# ttn.1 0 1 0
Upvotes: 1
Reputation: 66834
Obviously you will have a ragged array here, so how about this:
sapply(unique(unlist(dat[-1])), function(x) dat[apply(dat[-1],1,function(y) x%in%y),1])
$ttn
[1] "sample1"
$bae
[1] "sample2"
$pas
[1] "sample3"
$mth
[1] "sample1" "sample3"
$ttn.1
[1] "sample2"
$kasd
[1] "sample3"
$lik
[1] "sample1"
$apk
[1] "sample2"
Upvotes: 2