Anubhav Dikshit
Anubhav Dikshit

Reputation: 1829

Sort a dataframe column by the frequency of occurrence

I have a dataframe in called df, there are three column lets say,

Region ID  Salary
1      A1  100
1      A2  1001
1      A3  2000
1      A4  2431
1      A5  1001
..............
..............
2      A6  1002
2      A7  1002
2      A8  1002
3      A9  3001
3      A10 3001
3      A11 4001

Now I want to sort column Salary by the occurrence of them by Region, that is using frequency table or something, get the probability of occurrence per region and sort them. Please assume that the dataset is large enough (1000 rows)

P.S: Can anyone suggest a good method to do some. Please use column name in your answers since the real table has some column in the middle

Thanks in advance

                  **EDIT 1**

I think I was not clear enough, thanks for all those who replied, I sincerely apologise for not being clear:

With the current dataset we need to create a frequency table say:

Region  Salary(bin)     Count
1       1K              6                   
1       5K              3                   
1       2K              2                   
1       15K             2                   
1       0.5K            2                   
1       24K             1                   
1       0K              0                   

using this we can classify add a new columns in our data frame df called bin(bucket from histogram)

Region     ID  Salary  (bin)   Count
    1      A1  100     1K      6
    1      A2  1001    2K      2
    1      A3  2000    2K      2
    1      A4  2431    5K      3

..........................So on...............

We can do the above using:

df$bin <- cut(df$salary, breaks=hist(df$salary)$breaks)

After sorting by Region and Count and Salary we get:

Region     ID  Salary  (bin)   Count
    1      A1  100     1K      6
    1      A4  2431    5K      3
    1      A3  2000    2K      2
    1      A2  1001    2K      2

As you can see, we need to create frequency table for each region and do sort. I did the above using Tableau but I want to automate this in R

Hope I was clear

Upvotes: 2

Views: 13048

Answers (3)

mpalanco
mpalanco

Reputation: 13580

Base R

df <- transform(df, freq= ave(seq(nrow(df)), Salary, FUN=length))
df[order(-df$freq), ]

Output:

   Region  ID Salary freq
6       2  A6   1002    3
7       2  A7   1002    3
8       2  A8   1002    3
2       1  A2   1001    2
5       1  A5   1001    2
9       3  A9   3001    2
10      3 A10   3001    2
1       1  A1    100    1
3       1  A3   2000    1
4       1  A4   2431    1
11      3 A11   4001    1

dplyr

library(dplyr)
df %>%
  add_count(Salary) %>% 
  arrange(desc(n))

Output:

  Region  ID Salary n
1       2  A6   1002 3
2       2  A7   1002 3
3       2  A8   1002 3
4       1  A2   1001 2
5       1  A5   1001 2
6       3  A9   3001 2
7       3 A10   3001 2
8       1  A1    100 1
9       1  A3   2000 1
10      1  A4   2431 1
11      3 A11   4001 1

Upvotes: 6

PereG
PereG

Reputation: 1856

library(dplyr)
data %>% group_by(Region, Salary) %>% summarise(n=n()) %>% ungroup() %>% arrange(-n)

(edited: incloude commentary on the ungroupfrom Jaap, good point)

Upvotes: 2

mtoto
mtoto

Reputation: 24198

One possible approach is to use data.table to add freq column, then sort your data accordingly:

library(data.table)
setDT(df)[,freq := .N, by = c("Region","Salary")]

# Sort
df[order(freq, decreasing = T),]

# As a oneliner (thx @Jaap)
setDT(df)[, freq := .N, by = .(Region,Salary)][order(-freq)]

Upvotes: 10

Related Questions