Asmita Poddar
Asmita Poddar

Reputation: 532

Group data according to frequency of values in a column in a data frame using R

I have a data frame like the following:

a  b
1  23
2  34
1  34
3  45
1  56
3  567
2  67
2  90
1  91
3  98

I want to get the data frame with rows grouped according to the frequency of values in the first column. The output should be like the following:

a  b  freq
1  23   4
1  34   4
1  56   4
1  91   4
2  34   3
2  67   3
2  90   3
3  45   3
3  567  3
3  98   3

I have written the following code in R:

import library(dplyr)
setDT(df)[,freq := .N, by = "a"]
sorted = df[order(freq, decreasing = T),]
sorted

However, I get the following data frame as the output.

    a  b freq
 1: 1  23    4
 2: 1  34    4
 3: 1  56    4
 4: 1  91    4
 5: 2  34    3
 6: 3  45    3
 7: 3  567   3
 8: 2  67    3
 9: 2  90    3
10: 3  98    3

How can I solve this problem?

Upvotes: 1

Views: 218

Answers (3)

Mouad_Seridi
Mouad_Seridi

Reputation: 2716

> df <- read.table(text = 'a  b
+ 1  23
+ 2  34
+ 1  34
+ 3  45
+ 1  56
+ 3  567
+ 2  67
+ 2  90
+ 1  91
+ 3  98', header = T, stringsAsFactors = F)
> 
> df %>% group_by(a) %>%
+   mutate(Freq = n()) %>%
+   ungroup() %>%
+   arrange(a)
# A tibble: 10 × 3
       a     b  Freq
   <int> <int> <int>
1      1    23     4
2      1    34     4
3      1    56     4
4      1    91     4
5      2    34     3
6      2    67     3
7      2    90     3
8      3    45     3
9      3   567     3
10     3    98     3

Upvotes: 1

akrun
akrun

Reputation: 886938

We can use n()

library(dplyr)
df1 %>%
    group_by(a) %>%
    mutate(freq = n()) %>%
    arrange(a, desc(freq))
# A tibble: 10 x 3
# Groups:   a [3]
#       a     b  freq
#  <int> <int> <int>
# 1     1    23     4
# 2     1    34     4
# 3     1    56     4
# 4     1    91     4
# 5     2    34     3
# 6     2    67     3
# 7     2    90     3
# 8     3    45     3
# 9     3   567     3
#10     3    98     3

Upvotes: 1

minem
minem

Reputation: 3650

It looks like you want to use setorder from data.table package. You have ordered your data by freq, but you want also to apply order on column a.

setorder example:

> set.seed(12)
> df <- data.table(freq = sample(5, 5), a = sample(5, 5))
> df
   freq a
1:    1 1
2:    4 5
3:    3 2
4:    5 4
5:    2 3
> setorder(df, freq, a)
> df
   freq a
1:    1 1
2:    2 3
3:    3 2
4:    4 5
5:    5 4

Upvotes: 1

Related Questions