Reputation: 343
I'd like to see if there is a more compact way to complete the following task using DPLYR. I want to go from this:
id name value average max min
1 chris 8
1 chris 5
1 chris 3
1
1 2
2 tom 12
2 tom 6
2 tom 4
2 tom
2
to this:
id name value average max min
1 chris 8 4.5 8 2
1 chris 5 4.5 8 2
1 chris 3 4.5 8 2
1 4.5 8 2
1 2 4.5 8 2
2 tom 12 7.3 12 4
2 tom 6 7.3 12 4
2 tom 4 7.3 12 4
2 tom 7.3 12 4
2 7.3 12 4
The values are grouped by id. Right now I do the following series of commands:
Step 1: take the average by id and create a new df:
library(dplyr)
new_df <- df %>%
group_by(id) %>%
summarise_each(funs(mean(value, na.rm=TRUE)))
Step 2: Remove all variables except average in new_df
Step 3: Merge new_df back to master
Is there a way to direct the results from DPLYR directly into a variable based on an ID? Thank you.
New Code
df <- df %>%
group_by(id) %>%
mutate_each(funs(average = mean(value, na.rm = TRUE),
min = min(value, na.rm = TRUE),
max = max(value, na.rm = TRUE)))
Data Frame Code
id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
name = c("chris", "chris", "chris", "", "", "tom","tom","tom","tom","")
value = c(8,5,3,"",2,12,6,4,"","")
average = c(1:10)
max = c("","","","","","","","","","")
min = c("","","","","","","","","","")
df = data.frame(id, name, value, average, max, min)
Upvotes: 2
Views: 355
Reputation: 14958
To avoid the factor trap, use the tibble wrapper data_frame
to build the data frame.
df = data_frame(id, name, value, average, max, min)
since the value column is character typed due to the presence of "", it needs to be turned to a numeric. Happily, this also transforms your ""s into NA's.
then (in just one step) screen out the NA's from the calculations with the mutate/summarise friendly na.omit()
.
df1 <-
select(df, id, value) %>%
mutate(value = as.numeric(value)) %>%
na.omit %>%
group_by(id) %>%
summarise(average = mean(value),
max = max(value),
min = min(value)
) %>%
right_join(select(df, id, name, value), .)
> df1
# A tibble: 10 × 6
id name value average max min
<dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 1 chris 8 4.500000 8 2
2 1 chris 5 4.500000 8 2
3 1 chris 3 4.500000 8 2
4 1 4.500000 8 2
5 1 2 4.500000 8 2
6 2 tom 12 7.333333 12 4
7 2 tom 6 7.333333 12 4
8 2 tom 4 7.333333 12 4
9 2 tom 7.333333 12 4
10 2 7.333333 12 4
Upvotes: 3
Reputation: 3557
How about:
library(dplyr)
df %>%
group_by(id) %>%
mutate(average = mean(value, na.rm = TRUE),
min = min(value, na.rm = TRUE),
max = max(value, na.rm = TRUE)) %>%
left_join(df)
Upvotes: 0
Reputation: 709
Probably like this:
library(dplyr)
df <- df %>%
group_by(id) %>%
mutate(average = mean(value, na.rm = TRUE),
min = min(value, na.rm = TRUE),
max = max(value, na.rm = TRUE))
Upvotes: 4