fuul
fuul

Reputation: 55

Group by amount of location

I would like to find out how orders are distributed according to the number of localities. So, how many order numbers are going to one locality, how many two and so on.

Example:

Nr <- c("x1", "x2", "x2", "x2", "x3", "x4", "x4", "x4", "x5", "x5", "x5", 
"x6")
location <- c("a", "b", "a", "b", "c", "a", "a", "a", "a", "b", "c", "d")
(test <- data.frame(cbind(Nr, location)))

> test
Nr location
1  x1        a
2  x2        b
3  x2        a
4  x2        b
5  x3        c
6  x4        a
7  x4        a
8  x4        a
9  x5        a
10 x5        b
11 x5        c
12 x6        d

Is my dataset.

With:

test %>% group_by(Nr, location) %>% summarise(quantity = n()) -> test2

I get my table sorted by order number and location including the corresponding quantity:

# A tibble: 9 x 3
# Groups:   Nr [6]
Nr    location quantity
<fct> <fct>     <int>
1 x1    a             1
2 x2    a             1
3 x2    b             2
4 x3    c             1
5 x4    a             3
6 x5    a             1
7 x5    b             1
8 x5    c             1
9 x6    d             1

now I want to know how often a order number visits only one location how often two and so on.

The result should be a table like the next one:

Table
n_Order     NumberOfObservations
1           6
2           3
3           3
....        ....

Unfortunately, I don't know how to do this. Can someone help me?

Upvotes: 2

Views: 121

Answers (2)

akrun
akrun

Reputation: 887048

An option is to create a column of number of distinct elements of 'location' by 'Nr' and get the count

library(dplyr)
test %>% 
  group_by(Nr) %>% 
  mutate(n_Order = n_distinct(location)) %>% 
  ungroup %>% 
  count(n_Order)
# A tibble: 3 x 2
#  n_Order     n
#    <int> <int>
#1       1     6
#2       2     3
#3       3     3

Or in base R with ave

with(test, table(ave(location, Nr, FUN = function(x) length(unique(x)))))

Upvotes: 3

Gautam
Gautam

Reputation: 2753

Here's something that will give the same output as your figure:

library(data.table)

test <- as.data.table(test)

> str(test)
Classes ‘data.table’ and 'data.frame':  12 obs. of  2 variables:
 $ order   : chr  "x1" "x2" "x2" "x2" ...
 $ location: chr  "a" "b" "a" "b" ...
 - attr(*, ".internal.selfref")=<externalptr> 

> test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)][, .(observations = sum(total_qty)), by = .(num_locations)]
   num_locations observations
1:             1            6
2:             2            3
3:             3            3

I first group by order and create a new variable num_locations for the unique location that an order goes to and get the total order qty as total_qty. Then all that remains is to sum by num_locations to get the desired output.

The above code does it one line by using 'chaining':

dat[ ... some operations ... ][... other operations ...]

you can do it in two steps by saving the first chain as test2, say.

# intermediate data.table test2
> test2 <- test[, .(num_locations = length(unique(location)), total_qty = .N), by = .(order)]

> test2[, .(observations = sum(total_qty)), by = .(num_locations)]
   num_locations observations
1:             1            6
2:             2            3
3:             3            3

Upvotes: 1

Related Questions