Sunny League
Sunny League

Reputation: 149

Most frequent in each column in dataframe

What do you do if you wanted to find the maximum frequency for each columns in a dataframe and return the factors, categories, and frequency?

So I have the code as follows:

dfreqcommon = data.frame()

for (i in 1:ncol(diamonds)){

dfc = data.frame(t(table(diamonds[,i])))
dfc$Var1 = names(diamonds)[i]

dfreqcommon = rbind(dfreqcommon, dfc)

}

names(dfreqcommon) = c("Factors","Categories","Frequency")

dfreqcommon

But this seemed to return all factors, categories, and frequency. I just wanted the maximum frequency for each factors and get its categories as well. I tried to change dfc to

dfc = data.frame(max(t(table(diamonds[,i]))))

But it doesn't show the categories. Is there any way to fix this?

Upvotes: 1

Views: 666

Answers (2)

Cath
Cath

Reputation: 24074

Another way, with base R:

library(ggplot2) # only to get the diamonds data.frame

data.frame(Factors=colnames(diamonds), 
           t(sapply(diamonds, # apply following function to each column
                    function(x) {
                        t_x <- sort(table(x), decreasing=TRUE) # get the frequencies and sort them in decreasing order
                        list(Categories=names(t_x)[1], # name of the value with highest frequency
                             Frequency=t_x[1]) # highest frequency
                    })))
#        Factors Categories Frequency
#carat     carat        0.3      2604
#cut         cut      Ideal     21551
#color     color          G     11292
#clarity clarity        SI1     13065
#depth     depth         62      2239
#table     table         56      9881
#price     price        605       132
#x             x       4.37       448
#y             y       4.34       437
#z             z        2.7       767

Upvotes: 2

markdly
markdly

Reputation: 4534

Do you mean you want a result something like this? The following example shows how you could get the most frequently occurring value for each column in the ggplot2::diamonds dataset.

library(dplyr)
library(tidyr)
ggplot2::diamonds %>% 
  mutate_all(as.character) %>%
  gather(varname, value) %>%
  count(varname, value) %>%
  group_by(varname) %>%
  arrange(desc(n), .by_group = TRUE) %>%
  slice(1)

#> # A tibble: 10 x 3
#> # Groups:   varname [10]
#>    varname value     n
#>      <chr> <chr> <int>
#>  1   carat   0.3  2604
#>  2 clarity   SI1 13065
#>  3   color     G 11292
#>  4     cut Ideal 21551
#>  5   depth    62  2239
#>  6   price   605   132
#>  7   table    56  9881
#>  8       x  4.37   448
#>  9       y  4.34   437
#> 10       z   2.7   767

Upvotes: 1

Related Questions