Reputation: 1270
Here is a sample of my data:
df <- read.table(header = TRUE, text =
"book pen desk ipad
3 4 3 4
3 0 0 3
0 3 0 2
1 3 2 1
4 1 4 3
0 0 3 1
2 1 3 2
0 2 1 0
4 2 2 2
0 1 2 1
1 4 1 4
2 0 1 3
4 3 2 0
4 0 4 2"
)
The logic is that I want to have three categories: Low, Medium and high.
As an example, considering the column book, the values 0 and 1= Low, 2= Medium, 3 and 4=High. Next, I want to calculate the percentage for each category. As you can see below, for Low in the column book, the percentage is 42.85. I want to have an output like this for all columns. Please consider this is just a sample. Thanks for your help
Class Low Midium High
book 42.85 xx xx
pen xx xx xx
desk xx xx xx
ipad xx xx xx
Upvotes: 1
Views: 123
Reputation: 1643
Here's a tidyverse
solution for you:
library(tidyverse)
df <- read.table(header = TRUE, text =
"book pen desk ipad
3 4 3 4
3 0 0 3
0 3 0 2
1 3 2 1
4 1 4 3
0 0 3 1
2 1 3 2
0 2 1 0
4 2 2 2
0 1 2 1
1 4 1 4
2 0 1 3
4 3 2 0
4 0 4 2"
)
df %>%
pivot_longer(1:4,
names_to = "Class",
values_to = "value") %>%
mutate(category = case_when(value %in% 0:1 ~ "l",
value == 2 ~ "m",
value %in% 3:4 ~ "h")) %>%
group_by(Class, category) %>%
count(category) %>%
pivot_wider(names_from = category, values_from = n) %>%
transmute(Class = Class,
High = h / sum(h, m, l)*100,
Medium = m / sum(h, m, l)*100,
Low = l / sum(h, m, l)*100)
And the resulting table:
# A tibble: 4 x 4
# Groups: Class [4]
Class High Medium Low
<chr> <dbl> <dbl> <dbl>
1 book 42.9 14.3 42.9
2 desk 35.7 28.6 35.7
3 ipad 35.7 28.6 35.7
4 pen 35.7 14.3 50
Upvotes: 1
Reputation: 160407
ret <- t(sapply(df, function(a) {
lbls <- factor(c("Low", "Medium", "High"))
ct <- cut(a, c(0, 2, 4, Inf), right = FALSE, labels = lbls)
table(ct)
}))
t(apply(ret, 1, function(z) 100*z/sum(z)))
# Low Medium High
# book 42.85714 28.57143 28.57143
# pen 50.00000 35.71429 14.28571
# desk 35.71429 50.00000 14.28571
# ipad 35.71429 50.00000 14.28571
As a data.frame:
out <- as.data.frame(t(apply(ret, 1, function(z) 100*z/sum(z))))
out$Class <- rownames(out)
# rownames(out) <- NULL # optional, if you don't want them
out <- out[,c(4,1:3)]
out
# Class Low Medium High
# book book 42.85714 28.57143 28.57143
# pen pen 50.00000 35.71429 14.28571
# desk desk 35.71429 50.00000 14.28571
# ipad ipad 35.71429 50.00000 14.28571
Upvotes: 2
Reputation: 11
Enclosed a possible solution. There might be a better one, but I would have come up with it "spontaneously".
Greetings
df_test <- read.table(header = TRUE, text =
"book pen desk ipad
3 4 3 4
3 0 0 3
0 3 0 2
1 3 2 1
4 1 4 3
0 0 3 1
2 1 3 2
0 2 1 0
4 2 2 2
0 1 2 1
1 4 1 4
2 0 1 3
4 3 2 0
4 0 4 2"
)
low <- list()
medium <- list()
high <- list()
for(i in 1:ncol(df_test)) # i=1
{
low[[i]] <- ifelse((df_test[,i]==0 | df_test[,i]==1),df_test[,i],NA)
low[[i]] <- sum(colSums(!is.na(t(low[[i]])))) / length(low[[i]]) *100
medium[[i]] <- ifelse((df_test[,i]==2 | df_test[,i]==3),df_test[,i],NA)
medium[[i]] <- sum(colSums(!is.na(t(medium[[i]])))) / length(medium[[i]]) *100
high[[i]] <- ifelse((df_test[,i]==4 | df_test[,i]==5),df_test[,i],NA)
high[[i]] <- sum(colSums(!is.na(t(high[[i]])))) / length(high[[i]]) *100
}
names(low) <- colnames(df_test)
names(medium) <- colnames(df_test)
names(high) <- colnames(df_test)
df_test_final <- data.frame("Class"=colnames(df_test),"Low"=NA,"Medium"=NA,"High"=NA)
df_test_final[,2] <- do.call(rbind,low)
df_test_final[,3] <- do.call(rbind,medium)
df_test_final[,4] <- do.call(rbind,high)
Upvotes: 0