Ramiro Guzmán
Ramiro Guzmán

Reputation: 45

Group by and ungroup according the assignation values in R

Hi¡ I have the following question. Group_by in a dataframe can group information and estimate some values, but are there any function to ungroup the information and keep the estimated values according to their corresponding values? I show the following example:

base

| ID   | location |value|
| ---- | ---------|-----|
| 1    | a        |5    |
| 1    | a        |2    |
| 2    | b        |3    |
| 2    | d        |1    |
| 3    | c        |2    |
| 3    | c        |3    |
group_by(base, ID, location) %>% summarise(value=sum(value))

The result must be

| ID   | location |value|
| ---- | ---------|-----|
| 1    | a        |7    |
| 2    | b        |3    |
| 2    | d        |1    |
| 3    | c        |5    |

And then I would like it look like:

| ID   | location |original value|estimated|
| ---- | ---------|--------------|---------|
| 1    | a        |5             |7        |
| 1    | a        |2             |7        |
| 2    | b        |3             |3        |
| 2    | d        |1             |1        |
| 3    | c        |2             |5        |
| 3    | c        |3             |5        |

Upvotes: 1

Views: 911

Answers (1)

Shawn Hemelstrand
Shawn Hemelstrand

Reputation: 3228

Solution

As highlighted in the comments below by GuedesBF, "summarise will output one value/line per group, while mutate will create a column with values calculated based on grouping, but with one value for every initial row, without modifying the original data's dimensions." As such, I replicated your data and used mutate instead:

#### Load Library ####
library(tidyverse)

#### Create Data ####
id <- c(1,1,2,2,3,3)
location <- c("a","a","b","d","c","c")
value <- c(5,2,3,1,2,3)
df <- data.frame(id,
                 location,
                 value)

#### Make Grouped Sum ####
df %>% 
  group_by(id, location) %>%
  mutate(grouped.value=sum(value))

Which gives you this:

# A tibble: 6 × 4
# Groups:   id, location [4]
     id location value grouped.value
  <dbl> <chr>    <dbl>     <dbl>
1     1 a            5         7
2     1 a            2         7
3     2 b            3         3
4     2 d            1         1
5     3 c            2         5
6     3 c            3         5

Side Note on Ungrouping

Technically there is a function called ungroup for this purpose:

df %>% 
  group_by(id, location) %>%
  summarize(grouped.value=sum(value)) %>% 
  ungroup()

But it wouldn't work here with summarise because it reduces the dimensions already of the data frame:

`summarise()` has grouped output by 'id'. You can override using
the `.groups` argument.
# A tibble: 4 × 3
     id location grouped.value
  <dbl> <chr>            <dbl>
1     1 a                    7
2     2 b                    3
3     2 d                    1
4     3 c                    5

An example of ungrouping in action can be seen below, where we create a grouped sum variable, ungroup, then make a new ratio variable that we dont want to group with to get the value:

iris %>% 
  select(Species,
         Petal.Length, 
         Petal.Width) %>% 
  group_by(Species) %>% 
  mutate(Group.Sum.Petals = sum(Petal.Length)) %>% 
  ungroup() %>% 
  mutate(Width.Length.Ratio = Petal.Length/Petal.Width)

Which gives us what we want. You can see all of these values are for the setosa species, where the grouped values are the same but the ratio is different for each observation:

# A tibble: 150 × 5
   Species Petal.Length Petal.Width Group.Sum.Petals Width.Length.…¹
   <fct>          <dbl>       <dbl>            <dbl>           <dbl>
 1 setosa           1.4         0.2             73.1            7   
 2 setosa           1.4         0.2             73.1            7   
 3 setosa           1.3         0.2             73.1            6.5 
 4 setosa           1.5         0.2             73.1            7.5 
 5 setosa           1.4         0.2             73.1            7   
 6 setosa           1.7         0.4             73.1            4.25
 7 setosa           1.4         0.3             73.1            4.67
 8 setosa           1.5         0.2             73.1            7.5 
 9 setosa           1.4         0.2             73.1            7   
10 setosa           1.5         0.1             73.1           15   
# … with 140 more rows, and abbreviated variable name

To summarize, group_by functions as a placeholder to keep your data grouped for data manipulation, whereas ungroup removes this placeholder so you can do manipulations without this grouping.

Upvotes: 4

Related Questions