Reputation: 55
I would like to find out how orders are distributed according to the number of localities. So, how many order numbers are going to one locality, how many two and so on.
Example:
Nr <- c("x1", "x2", "x2", "x2", "x3", "x4", "x4", "x4", "x5", "x5", "x5",
"x6")
location <- c("a", "b", "a", "b", "c", "a", "a", "a", "a", "b", "c", "d")
(test <- data.frame(cbind(Nr, location)))
> test
Nr location
1 x1 a
2 x2 b
3 x2 a
4 x2 b
5 x3 c
6 x4 a
7 x4 a
8 x4 a
9 x5 a
10 x5 b
11 x5 c
12 x6 d
Is my dataset.
With:
test %>% group_by(Nr, location) %>% summarise(quantity = n()) -> test2
I get my table sorted by order number and location including the corresponding quantity:
# A tibble: 9 x 3
# Groups: Nr [6]
Nr location quantity
<fct> <fct> <int>
1 x1 a 1
2 x2 a 1
3 x2 b 2
4 x3 c 1
5 x4 a 3
6 x5 a 1
7 x5 b 1
8 x5 c 1
9 x6 d 1
now I want to know how often a order number visits only one location how often two and so on.
The result should be a table like the next one:
Table
n_Order NumberOfObservations
1 6
2 3
3 3
.... ....
Unfortunately, I don't know how to do this. Can someone help me?
Upvotes: 2
Views: 121
Reputation: 887048
An option is to create a column of number of distinct elements of 'location' by 'Nr' and get the count
library(dplyr)
test %>%
group_by(Nr) %>%
mutate(n_Order = n_distinct(location)) %>%
ungroup %>%
count(n_Order)
# A tibble: 3 x 2
# n_Order n
# <int> <int>
#1 1 6
#2 2 3
#3 3 3
Or in base R
with ave
with(test, table(ave(location, Nr, FUN = function(x) length(unique(x)))))
Upvotes: 3
Reputation: 2753
Here's something that will give the same output as your figure:
library(data.table)
test <- as.data.table(test)
> str(test)
Classes ‘data.table’ and 'data.frame': 12 obs. of 2 variables:
$ order : chr "x1" "x2" "x2" "x2" ...
$ location: chr "a" "b" "a" "b" ...
- attr(*, ".internal.selfref")=<externalptr>
> test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)][, .(observations = sum(total_qty)), by = .(num_locations)]
num_locations observations
1: 1 6
2: 2 3
3: 3 3
I first group by order
and create a new variable num_locations
for the unique location
that an order goes to and get the total order qty as total_qty
. Then all that remains is to sum by num_locations
to get the desired output.
The above code does it one line by using 'chaining':
dat[ ... some operations ... ][... other operations ...]
you can do it in two steps by saving the first chain as test2
, say.
# intermediate data.table test2
> test2 <- test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)]
> test2[, .(observations = sum(total_qty)), by = .(num_locations)]
num_locations observations
1: 1 6
2: 2 3
3: 3 3
Upvotes: 1