Reputation: 45
Hi¡ I have the following question. Group_by in a dataframe can group information and estimate some values, but are there any function to ungroup the information and keep the estimated values according to their corresponding values? I show the following example:
base
| ID | location |value|
| ---- | ---------|-----|
| 1 | a |5 |
| 1 | a |2 |
| 2 | b |3 |
| 2 | d |1 |
| 3 | c |2 |
| 3 | c |3 |
group_by(base, ID, location) %>% summarise(value=sum(value))
The result must be
| ID | location |value|
| ---- | ---------|-----|
| 1 | a |7 |
| 2 | b |3 |
| 2 | d |1 |
| 3 | c |5 |
And then I would like it look like:
| ID | location |original value|estimated|
| ---- | ---------|--------------|---------|
| 1 | a |5 |7 |
| 1 | a |2 |7 |
| 2 | b |3 |3 |
| 2 | d |1 |1 |
| 3 | c |2 |5 |
| 3 | c |3 |5 |
Upvotes: 1
Views: 911
Reputation: 3228
As highlighted in the comments below by GuedesBF, "summarise
will output one value/line per group, while mutate
will create a column with values calculated based on grouping, but with one value for every initial row, without modifying the original data's dimensions." As such, I replicated your data and used mutate
instead:
#### Load Library ####
library(tidyverse)
#### Create Data ####
id <- c(1,1,2,2,3,3)
location <- c("a","a","b","d","c","c")
value <- c(5,2,3,1,2,3)
df <- data.frame(id,
location,
value)
#### Make Grouped Sum ####
df %>%
group_by(id, location) %>%
mutate(grouped.value=sum(value))
Which gives you this:
# A tibble: 6 × 4
# Groups: id, location [4]
id location value grouped.value
<dbl> <chr> <dbl> <dbl>
1 1 a 5 7
2 1 a 2 7
3 2 b 3 3
4 2 d 1 1
5 3 c 2 5
6 3 c 3 5
Technically there is a function called ungroup
for this purpose:
df %>%
group_by(id, location) %>%
summarize(grouped.value=sum(value)) %>%
ungroup()
But it wouldn't work here with summarise
because it reduces the dimensions already of the data frame:
`summarise()` has grouped output by 'id'. You can override using
the `.groups` argument.
# A tibble: 4 × 3
id location grouped.value
<dbl> <chr> <dbl>
1 1 a 7
2 2 b 3
3 2 d 1
4 3 c 5
An example of ungrouping in action can be seen below, where we create a grouped sum variable, ungroup, then make a new ratio variable that we dont want to group with to get the value:
iris %>%
select(Species,
Petal.Length,
Petal.Width) %>%
group_by(Species) %>%
mutate(Group.Sum.Petals = sum(Petal.Length)) %>%
ungroup() %>%
mutate(Width.Length.Ratio = Petal.Length/Petal.Width)
Which gives us what we want. You can see all of these values are for the setosa species, where the grouped values are the same but the ratio is different for each observation:
# A tibble: 150 × 5
Species Petal.Length Petal.Width Group.Sum.Petals Width.Length.…¹
<fct> <dbl> <dbl> <dbl> <dbl>
1 setosa 1.4 0.2 73.1 7
2 setosa 1.4 0.2 73.1 7
3 setosa 1.3 0.2 73.1 6.5
4 setosa 1.5 0.2 73.1 7.5
5 setosa 1.4 0.2 73.1 7
6 setosa 1.7 0.4 73.1 4.25
7 setosa 1.4 0.3 73.1 4.67
8 setosa 1.5 0.2 73.1 7.5
9 setosa 1.4 0.2 73.1 7
10 setosa 1.5 0.1 73.1 15
# … with 140 more rows, and abbreviated variable name
To summarize, group_by
functions as a placeholder to keep your data grouped for data manipulation, whereas ungroup
removes this placeholder so you can do manipulations without this grouping.
Upvotes: 4