Reputation: 938
I have a data frame that looks like
GeneID person1 person2 ... person100 homo1 homo2 heter homo1count homo2count hetercount
1 AA AC AA AA CC AC 25 50 25
2 .....
3 .....
How may I get the count 25, 50, 25?
I was trying to use apply as
g <- function(df, AA) {
x = table(df)
AA = x[which(names(x) == df$homo1)]
}
x = apply(temp,1,g)
But it didn't work, the df$homo1 is always a list
Thanks!
Upvotes: 0
Views: 104
Reputation: 9696
This is easier if you pivot to long format first, then aggregate. Something like this:
require(reshape2)
require(dplyr)
g <- c('AC', 'AA', 'CC')
n <- 30
df <- data.frame(gene_id=1:30, person1=sample(g,n,replace=TRUE),
person2=sample(g,n,replace=TRUE),
person3=sample(g,n,replace=TRUE),
person4=sample(g,n,replace=TRUE),
homo1=sample(g,n,replace=TRUE),
homo2=sample(g,n,replace=TRUE),
stringsAsFactors=FALSE)
df %>% melt(c("gene_id", "homo1", "homo2")) %>%
group_by(gene_id) %>%
summarise(homo1count=sum(homo1==value),
homo2count=sum(homo2==value) ) %>%
merge(x=df)
EDIT: sample output:
gene_id person1 person2 person3 person4 homo1 homo2 homo1count homo2count
1 1 AA CC AC AA AC CC 1 1
2 2 AC AA CC CC CC AA 2 1
3 3 AC CC CC AA CC AA 2 1
4 4 AC AC AC AA AA AA 1 1
5 5 CC AC AA AC AA AC 1 2
6 6 CC AC CC CC AA AA 0 0
7 7 AA AA AC AA CC CC 0 0
8 8 AA AC AA CC AC CC 1 1
Upvotes: 2