Reputation: 4243
I have a dataframe that looks like this:
COLA COLB COLC COLD COLE
Name1 yes A AB uno
Name2 yes B AC dos
Name3 no C AB tres
Name4 no D AC cuatro
How do I create a proportion dataframe that shows the percentages of each selected column value along with the frequency:
ATTRIBUTE Percentages Frequency
*COLB* *Percentage* *Amount*
yes 50% 2
no 50% 2
*COLC* *Percentage *Amount*
A 25% 1
B 25% 1
C 25% 1
D 25% 1
*COLD* *Percentage* *Amount*
AB 50% 2
AC 50% 2
It doesn't need to look exactly like this but I need it to all be in one dataframe and include only the selected columns mentioned.
Any help would be great, thanks!
Upvotes: 2
Views: 630
Reputation: 21507
You can do the following:
dat <- data.frame(COLA=paste0("name",1:4),
"COLB"=c("yes", "yes", "no", "no"))
require(purrr)
col_to_stat <- function(col){
tmp <- table(col)
data.frame(ATTRIBUTE = names(tmp), Percentages = c(tmp/length(col)), Frequency = c(tmp),
stringsAsFactors = FALSE)
}
map_df(dat, col_to_stat, .id="col")
Which gives you:
col ATTRIBUTE Percentages Frequency
1 COLA name1 0.25 1
2 COLA name2 0.25 1
3 COLA name3 0.25 1
4 COLA name4 0.25 1
5 COLB no 0.50 2
6 COLB yes 0.50 2
If you want to print percentages instead of decimals have a look at: How to format a number as percentage in R?
P.S.: If you use tibble
instead of data.frame
you can use the following which is shorter:
tibble(ATTRIBUTE = names(tmp), Percentages = tmp/length(col), Frequency = tmp)
Upvotes: 2