Reputation: 1112
I have a data frame which looks like:
df = read.table(text="S00001 S00002 S00003 S00004 S00005 S00006
GG AA GG AA GG AG
CC TT TT TC TC TT
TT CC CC TT TT TT
AA AA GG AA AG AA
TT CC CC TT TC TT
GG GG GG AA GG GG", header=T, stringsAsFactors=F)
I would like to count the number of character strings with the same letters (i.e. "AA", "CC", "GG", or "TT") for each row. What I did is to use table() function to count all elements and generated another list based on if the names of lists are "homo". I tried to subset the lists but it didn't work. Here is my scripts:
A <- apply(df,1, function(x) table(x))
B <- apply(df,1, function(x) (names(table(x)) %in% c("AA","CC","GG","TT")))
A[B] ## this didn't work
I expect a data frame would be generated:
2 3
1 3
2 4
4 1
2 3
1 5
appreciate any helps.
Upvotes: 3
Views: 763
Reputation: 886938
We could do this with a single apply
t(apply(df, 1, function(x) {tbl <- table(x)
tbl[names(tbl) %in% c("AA", "CC", "GG", "TT")]}))
# [,1] [,2]
#[1,] 2 3
#[2,] 1 3
#[3,] 2 4
#[4,] 4 1
#[5,] 2 3
#[6,] 1 5
Upvotes: 3
Reputation: 28441
Try mapply
. It will take each element of the lists sequentially for evaluation. The header names are auto-generated, you can change them as you see fit:
t(mapply('[', A, B))
AA GG
[1,] 2 3
[2,] 1 3
[3,] 2 4
[4,] 4 1
[5,] 2 3
[6,] 1 5
As mentioned by CathG, you can avoid calculating B
with:
t(sapply(A, function(x){x[grepl("([A-Z])\\1", names(x))]}))
Upvotes: 4
Reputation: 92282
I don't like apply
due to matrix conversion and especially apply(df, 1,...)
due to by row operations.
Alternatively, I would suggest and helper function that uses sapply
combined with rowSums
(which will operate on sapply
matrix output)
f <- function(x, y) rowSums(sapply(x, `%in%`, y))
then you could do (without calculating A
and B
)
cbind(f(df, c("AA", "CC")),
f(df, c("GG", "TT")))
# [,1] [,2]
# [1,] 2 3
# [2,] 1 3
# [3,] 2 4
# [4,] 4 1
# [5,] 2 3
# [6,] 1 5
Or just (depends on what you looking for)
f(df, c("AA", "CC", "GG", "TT"))
# [1] 5 4 6 5 5 6
Upvotes: 3