DCRubyHound
DCRubyHound

Reputation: 343

Directing results from DPLYR to designated variables

I'd like to see if there is a more compact way to complete the following task using DPLYR. I want to go from this:

id  name    value   average max min
1   chris   8           
1   chris   5           
1   chris   3           
1                   
1           2           
2   tom     12          
2   tom     6           
2   tom     4           
2   tom             
2                   

to this:

id  name    value   average max min
1   chris   8       4.5     8   2
1   chris   5       4.5     8   2
1   chris   3       4.5     8   2
1                   4.5     8   2
1           2       4.5     8   2
2   tom     12      7.3     12  4
2   tom     6       7.3     12  4
2   tom     4       7.3     12  4
2   tom             7.3     12  4
2                   7.3     12  4

The values are grouped by id. Right now I do the following series of commands:

Step 1: take the average by id and create a new df:

library(dplyr)
new_df <- df %>%
  group_by(id) %>%
  summarise_each(funs(mean(value, na.rm=TRUE)))

Step 2: Remove all variables except average in new_df

Step 3: Merge new_df back to master

Is there a way to direct the results from DPLYR directly into a variable based on an ID? Thank you.

New Code

df <- df %>%
  group_by(id) %>%
  mutate_each(funs(average = mean(value, na.rm = TRUE),
                   min = min(value, na.rm = TRUE),
                   max = max(value, na.rm = TRUE)))

Data Frame Code

id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2) 
name = c("chris", "chris", "chris", "", "", "tom","tom","tom","tom","") 
value = c(8,5,3,"",2,12,6,4,"","") 
average = c(1:10)
max = c("","","","","","","","","","")
min = c("","","","","","","","","","")
df = data.frame(id, name, value, average, max, min)

Upvotes: 2

Views: 355

Answers (3)

leerssej
leerssej

Reputation: 14958

To avoid the factor trap, use the tibble wrapper data_frame to build the data frame.

df = data_frame(id, name, value, average, max, min)

since the value column is character typed due to the presence of "", it needs to be turned to a numeric. Happily, this also transforms your ""s into NA's. then (in just one step) screen out the NA's from the calculations with the mutate/summarise friendly na.omit().

df1 <- 
    select(df, id, value) %>%
    mutate(value = as.numeric(value)) %>%
    na.omit %>% 
    group_by(id) %>% 
    summarise(average = mean(value),
              max = max(value),
              min = min(value)
             ) %>% 
    right_join(select(df, id, name, value), .)

> df1
# A tibble: 10 × 6
      id  name value  average   max   min
   <dbl> <chr> <chr>    <dbl> <dbl> <dbl>
1      1 chris     8 4.500000     8     2
2      1 chris     5 4.500000     8     2
3      1 chris     3 4.500000     8     2
4      1             4.500000     8     2
5      1           2 4.500000     8     2
6      2   tom    12 7.333333    12     4
7      2   tom     6 7.333333    12     4
8      2   tom     4 7.333333    12     4
9      2   tom       7.333333    12     4
10     2             7.333333    12     4

Upvotes: 3

Jacob
Jacob

Reputation: 3557

How about:

library(dplyr)
df %>%
  group_by(id) %>%
  mutate(average = mean(value, na.rm = TRUE),
         min = min(value, na.rm = TRUE),
         max = max(value, na.rm = TRUE)) %>% 
  left_join(df)

Upvotes: 0

Wojciech Książek
Wojciech Książek

Reputation: 709

Probably like this:

library(dplyr)
df <- df %>%
  group_by(id) %>%
  mutate(average = mean(value, na.rm = TRUE),
         min = min(value, na.rm = TRUE),
         max = max(value, na.rm = TRUE))

Upvotes: 4

Related Questions