SonicProtein
SonicProtein

Reputation: 850

R How to calculate a proportion of some value by column and by row in a data frame

Sample dataframe:

df <- data.frame(c('ab','cd','..'),c('ab','..','ab'),c('..','cd','cd'))

I'm trying to get the proportion of ab's for each column and row, but ignoring ..'s from the total in the numerator and denominator.

Proportion of ab's = total number of ab's excluding ../ number of any symbol except ..

For example for column 1 (values are ab,cd,and ..), the proportion of ab's is 0.5

What I have so far:

fun <- function(x) {
    length(which(x == 'ab'))/length(which(x != '..'))
}
byColumn<- sapply(df[,1:ncol(df)],fun)
byRow <- sapply(df[1:nrow(df),],fun)

Expected result:

byColumn <- c(0.5,1.0,0.0)
byRow <- c(1.0,0.0,0.5)

Actual result:

byColumn <- c(0.5,1.0,0.0)
byRow <- c(0.5,1.0,0.0)

But byRow isn't working... it seems to be the same output as byColumn?

Upvotes: 1

Views: 1691

Answers (2)

mpalanco
mpalanco

Reputation: 13570

You can keep your function. Then byRowyou use the same code that is working byColumn but transposing the data frame:

byColumn <- sapply(df[, 1:ncol(df)], fun)
byRow <- sapply(as.data.frame(t(df))[, 1:ncol(df)], fun)

Output:

# By column
col1 col2 col3 
0.5  1.0  0.0 
# By row
 V1  V2  V3 
1.0 0.0 0.5 

Upvotes: 1

David Arenburg
David Arenburg

Reputation: 92282

I would define the function as follows (you can play around with the settings)

Propfunc <- function(x, dim = "col", equal = "ab", ignore = ".."){
  if(dim == "col") return(unname(colSums(x == equal)/colSums(x != ignore)))
  if(dim == "row") return(rowSums(x == equal)/rowSums(x != ignore))
  else stop("Unknown dim")
}

Propfunc(df)
## [1] 0.5 1.0 0.0
Propfunc(df, dim = "row")
## [1] 1.0 0.0 0.5
Propfunc(df, dim = "blabla")
## Error in Propfunc(df, dim = "blabla") : Unknown dim

Upvotes: 3

Related Questions