user330
user330

Reputation: 1270

calculating percentages by category them in R

Here is a sample of my data:

df <- read.table(header = TRUE, text =
                     "book  pen desk    ipad
3   4   3   4
3   0   0   3
0   3   0   2
1   3   2   1
4   1   4   3
0   0   3   1
2   1   3   2
0   2   1   0
4   2   2   2
0   1   2   1
1   4   1   4
2   0   1   3
4   3   2   0
4   0   4   2"
                 
)

The logic is that I want to have three categories: Low, Medium and high.

As an example, considering the column book, the values 0 and 1= Low, 2= Medium, 3 and 4=High. Next, I want to calculate the percentage for each category. As you can see below, for Low in the column book, the percentage is 42.85. I want to have an output like this for all columns. Please consider this is just a sample. Thanks for your help

Class   Low Midium  High
book    42.85   xx  xx
pen xx  xx  xx
desk    xx  xx  xx
ipad    xx  xx  xx

Upvotes: 1

Views: 123

Answers (3)

SCDCE
SCDCE

Reputation: 1643

Here's a tidyverse solution for you:

library(tidyverse)

df <- read.table(header = TRUE, text =
                   "book  pen desk    ipad
3   4   3   4
3   0   0   3
0   3   0   2
1   3   2   1
4   1   4   3
0   0   3   1
2   1   3   2
0   2   1   0
4   2   2   2
0   1   2   1
1   4   1   4
2   0   1   3
4   3   2   0
4   0   4   2"
                 
)

df %>% 
  pivot_longer(1:4, 
               names_to = "Class", 
               values_to = "value") %>% 
  mutate(category = case_when(value %in% 0:1 ~ "l",
                              value == 2 ~ "m",
                              value %in% 3:4 ~ "h")) %>% 
  group_by(Class, category) %>% 
  count(category) %>% 
  pivot_wider(names_from = category, values_from = n) %>% 
  transmute(Class = Class,
            High = h / sum(h, m, l)*100,
            Medium = m / sum(h, m, l)*100,
            Low = l / sum(h, m, l)*100)

And the resulting table:

# A tibble: 4 x 4
# Groups:   Class [4]
  Class  High Medium   Low
  <chr> <dbl>  <dbl> <dbl>
1 book   42.9   14.3  42.9
2 desk   35.7   28.6  35.7
3 ipad   35.7   28.6  35.7
4 pen    35.7   14.3  50  

Upvotes: 1

r2evans
r2evans

Reputation: 160407

ret <- t(sapply(df, function(a) {
  lbls <- factor(c("Low", "Medium", "High"))
  ct <- cut(a, c(0, 2, 4, Inf), right = FALSE, labels = lbls)
  table(ct)
}))

t(apply(ret, 1, function(z) 100*z/sum(z)))
#           Low   Medium     High
# book 42.85714 28.57143 28.57143
# pen  50.00000 35.71429 14.28571
# desk 35.71429 50.00000 14.28571
# ipad 35.71429 50.00000 14.28571

As a data.frame:

out <- as.data.frame(t(apply(ret, 1, function(z) 100*z/sum(z))))
out$Class <- rownames(out)
# rownames(out) <- NULL # optional, if you don't want them
out <- out[,c(4,1:3)]
out
#      Class      Low   Medium     High
# book  book 42.85714 28.57143 28.57143
# pen    pen 50.00000 35.71429 14.28571
# desk  desk 35.71429 50.00000 14.28571
# ipad  ipad 35.71429 50.00000 14.28571

Upvotes: 2

C_P_2
C_P_2

Reputation: 11

Enclosed a possible solution. There might be a better one, but I would have come up with it "spontaneously".

Greetings

df_test <- read.table(header = TRUE, text =
                   "book  pen desk    ipad
3   4   3   4
3   0   0   3
0   3   0   2
1   3   2   1
4   1   4   3
0   0   3   1
2   1   3   2
0   2   1   0
4   2   2   2
0   1   2   1
1   4   1   4
2   0   1   3
4   3   2   0
4   0   4   2"
                 
)

low <- list()
medium <- list()
high <- list()

for(i in 1:ncol(df_test)) # i=1
{
  low[[i]] <- ifelse((df_test[,i]==0 | df_test[,i]==1),df_test[,i],NA)
  low[[i]] <- sum(colSums(!is.na(t(low[[i]])))) / length(low[[i]]) *100
  medium[[i]] <- ifelse((df_test[,i]==2 | df_test[,i]==3),df_test[,i],NA)
  medium[[i]] <- sum(colSums(!is.na(t(medium[[i]])))) / length(medium[[i]]) *100
  high[[i]] <- ifelse((df_test[,i]==4 | df_test[,i]==5),df_test[,i],NA)
  high[[i]] <- sum(colSums(!is.na(t(high[[i]])))) / length(high[[i]]) *100
}

names(low) <- colnames(df_test)
names(medium) <- colnames(df_test)
names(high) <- colnames(df_test)

df_test_final <- data.frame("Class"=colnames(df_test),"Low"=NA,"Medium"=NA,"High"=NA)

df_test_final[,2] <- do.call(rbind,low)
df_test_final[,3] <- do.call(rbind,medium)
df_test_final[,4] <- do.call(rbind,high)

Upvotes: 0

Related Questions