Reputation: 1829
I have a dataframe in called df, there are three column lets say,
Region ID Salary
1 A1 100
1 A2 1001
1 A3 2000
1 A4 2431
1 A5 1001
..............
..............
2 A6 1002
2 A7 1002
2 A8 1002
3 A9 3001
3 A10 3001
3 A11 4001
Now I want to sort column Salary by the occurrence of them by Region, that is using frequency table or something, get the probability of occurrence per region and sort them. Please assume that the dataset is large enough (1000 rows)
P.S: Can anyone suggest a good method to do some. Please use column name in your answers since the real table has some column in the middle
Thanks in advance
**EDIT 1**
I think I was not clear enough, thanks for all those who replied, I sincerely apologise for not being clear:
With the current dataset we need to create a frequency table say:
Region Salary(bin) Count
1 1K 6
1 5K 3
1 2K 2
1 15K 2
1 0.5K 2
1 24K 1
1 0K 0
using this we can classify add a new columns in our data frame df called bin(bucket from histogram)
Region ID Salary (bin) Count
1 A1 100 1K 6
1 A2 1001 2K 2
1 A3 2000 2K 2
1 A4 2431 5K 3
..........................So on...............
We can do the above using:
df$bin <- cut(df$salary, breaks=hist(df$salary)$breaks)
After sorting by Region and Count and Salary we get:
Region ID Salary (bin) Count
1 A1 100 1K 6
1 A4 2431 5K 3
1 A3 2000 2K 2
1 A2 1001 2K 2
As you can see, we need to create frequency table for each region and do sort. I did the above using Tableau but I want to automate this in R
Hope I was clear
Upvotes: 2
Views: 13048
Reputation: 13580
Base R
df <- transform(df, freq= ave(seq(nrow(df)), Salary, FUN=length))
df[order(-df$freq), ]
Output:
Region ID Salary freq
6 2 A6 1002 3
7 2 A7 1002 3
8 2 A8 1002 3
2 1 A2 1001 2
5 1 A5 1001 2
9 3 A9 3001 2
10 3 A10 3001 2
1 1 A1 100 1
3 1 A3 2000 1
4 1 A4 2431 1
11 3 A11 4001 1
dplyr
library(dplyr)
df %>%
add_count(Salary) %>%
arrange(desc(n))
Output:
Region ID Salary n
1 2 A6 1002 3
2 2 A7 1002 3
3 2 A8 1002 3
4 1 A2 1001 2
5 1 A5 1001 2
6 3 A9 3001 2
7 3 A10 3001 2
8 1 A1 100 1
9 1 A3 2000 1
10 1 A4 2431 1
11 3 A11 4001 1
Upvotes: 6
Reputation: 1856
library(dplyr)
data %>% group_by(Region, Salary) %>% summarise(n=n()) %>% ungroup() %>% arrange(-n)
(edited: incloude commentary on the ungroup
from Jaap, good point)
Upvotes: 2
Reputation: 24198
One possible approach is to use data.table
to add freq
column, then sort your data accordingly:
library(data.table)
setDT(df)[,freq := .N, by = c("Region","Salary")]
# Sort
df[order(freq, decreasing = T),]
# As a oneliner (thx @Jaap)
setDT(df)[, freq := .N, by = .(Region,Salary)][order(-freq)]
Upvotes: 10