cianius
cianius

Reputation: 2412

append columns and count frequency in R

I have a data frame that looks like:

Name       Value1    Value2     Value3
sample1     ttn      mth        lik
sample2     bae      ttn.1      apk
sample3     pas      kasd       mth


dat <- structure(list(Name = c("sample1", "sample2", "sample3"), Value1 = c("ttn", 
"bae", "pas"), Value2 = c("mth", "ttn.1", "kasd"), Value3 = c("lik", 
"apk", "mth")), .Names = c("Name", "Value1", "Value2", "Value3"
), row.names = c(NA, -3L), class = "data.frame")

I would like to rearrange and count frequency so it would look like:

  Value     Source1     Source2
  ttn       sample1
  mth       sample1     sample3
  lik       sample1

How do I do this?

Upvotes: 0

Views: 353

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

These solutions don't get you exactly where you want to be, but might be close enough for you to work from there.

First, some data:

temp <- structure(list(Name = c("sample1", "sample2", "sample3"), 
                       Value1 = c("ttn", "bae", "pas"), 
                       Value2 = c("mth", "ttn.1", "kasd"), 
                       Value3 = c("lik", "apk", "mth")), 
                  .Names = c("Name", "Value1", "Value2", "Value3"), 
                  class = "data.frame", row.names = c(NA, -3L))
temp
#      Name Value1 Value2 Value3
# 1 sample1    ttn    mth    lik
# 2 sample2    bae  ttn.1    apk
# 3 sample3    pas   kasd    mth

These data are in "wide" form. Use reshape() to get it into "long" form.

temp1 <- reshape(temp, direction = "long", 
                 idvar="Name", varying = 2:4, sep = "")
#              Name time Value
# sample1.1 sample1    1   ttn
# sample2.1 sample2    1   bae
# sample3.1 sample3    1   pas
# sample1.2 sample1    2   mth
# sample2.2 sample2    2 ttn.1
# sample3.2 sample3    2  kasd
# sample1.3 sample1    3   lik
# sample2.3 sample2    3   apk
# sample3.3 sample3    3   mth

Now, use aggregate() from base R or dcast() from the "reshape2" package to aggregate based on the "value" values.

aggregate(Name ~ Value, temp1, c)
#   Value             Name
# 1   apk          sample2
# 2   bae          sample2
# 3  kasd          sample3
# 4   lik          sample1
# 5   mth sample1, sample3
# 6   pas          sample3
# 7   ttn          sample1
# 8 ttn.1          sample2
require(reshape2)
dcast(temp1, Value ~ Name, value.var = "Value")
#   Value sample1 sample2 sample3
# 1   apk    <NA>     apk    <NA>
# 2   bae    <NA>     bae    <NA>
# 3  kasd    <NA>    <NA>    kasd
# 4   lik     lik    <NA>    <NA>
# 5   mth     mth    <NA>     mth
# 6   pas    <NA>    <NA>     pas
# 7   ttn     ttn    <NA>    <NA>
# 8 ttn.1    <NA>   ttn.1    <NA>

You also mentioned that you would like to "count frequency", in which case, table() might also be appropriate:

table(temp1$Value, temp1$Name)
# 
#       sample1 sample2 sample3
# apk         0       1       0
# bae         0       1       0
# kasd        0       0       1
# lik         1       0       0
# mth         1       0       1
# pas         0       0       1
# ttn         1       0       0
# ttn.1       0       1       0

Upvotes: 1

James
James

Reputation: 66834

Obviously you will have a ragged array here, so how about this:

sapply(unique(unlist(dat[-1])), function(x) dat[apply(dat[-1],1,function(y) x%in%y),1])
$ttn
[1] "sample1"

$bae
[1] "sample2"

$pas
[1] "sample3"

$mth
[1] "sample1" "sample3"

$ttn.1
[1] "sample2"

$kasd
[1] "sample3"

$lik
[1] "sample1"

$apk
[1] "sample2"

Upvotes: 2

Related Questions