user3022875
user3022875

Reputation: 9018

Pivoting data in R

I have a dataframe:

dat<- data.frame(date = c("2015-01-01","2015-01-01","2015-01-01", "2015-01-01","2015-02-02","2015-02-02","2015-02-02","2015-02-02","2015-02-02"), val= c(10,20,30,50,300,100,200,200,400), type= c("A","A","B","C","A","A","B","C","C") )
dat

       date val type
1 2015-01-01  10    A
2 2015-01-01  20    A
3 2015-01-01  30    B
4 2015-01-01  50    C
5 2015-02-02 300    A
6 2015-02-02 100    A
7 2015-02-02 200    B
8 2015-02-02 200    C
9 2015-02-02 400    C

and I would like to have one row for each day with averages by type so the output would be:

Date           A     B     C
2015-01-01    15     30    50
2015-02-02    200     200   300

additionally how would I get the counts so the results are:

Date           A     B     C
2015-01-01    2      1     1
2015-02-02    2      1     2

Upvotes: 5

Views: 10611

Answers (5)

MSR
MSR

Reputation: 2881

I'll add the pivot_wider solution, which is meant to replace earlier tidyverse options, and which is

Using pivot_wider with the values_fn option, we can do the following:

library(tidyr) # At least 1.0.0

dat %>% pivot_wider(names_from = type, values_from = val, values_fn = list(val = mean))
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 2015-01-01    15    30    50
#> 2 2015-02-02   200   200   300

and

dat %>% pivot_wider(names_from = type, values_from = val, values_fn = list(val = length))
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <int> <int> <int>
#> 1 2015-01-01     2     1     1
#> 2 2015-02-02     2     1     2

Of course, if we want to get fancy, we can do both at once:

library(purrr)
library(rlang)

map(quos(mean, length), 
    ~pivot_wider(dat, names_from = type, values_from = val, values_fn = list(val = eval_tidy(.))))
#> [[1]]
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <dbl> <dbl> <dbl>
#> 1 2015-01-01    15    30    50
#> 2 2015-02-02   200   200   300
#> 
#> [[2]]
#> # A tibble: 2 x 4
#>   date           A     B     C
#>   <fct>      <int> <int> <int>
#> 1 2015-01-01     2     1     1
#> 2 2015-02-02     2     1     2

Created on 2019-12-04 by the reprex package (v0.3.0)

Note that if you're concerned about speed, it may be worth updating to the dev version of tidyr.

Upvotes: 1

Arun
Arun

Reputation: 118779

Using data.table v1.9.5 (current devel), we can do:

require(data.table) ## v1.9.5+
dcast(setDT(dat), date ~ type, fun = list(mean, length), value.var="val")
#          date A_mean_val B_mean_val C_mean_val A_length_val B_length_val C_length_val
# 1: 2015-01-01         15         30         50            2            1            1
# 2: 2015-02-02        200        200        300            2            1            2

Installation instructions here.

Upvotes: 2

Gregor Thomas
Gregor Thomas

Reputation: 145755

library(reshape2)
dcast(data = dat, formula = date ~ type, fun.aggregate = mean, value.var = "val")

#         date   A   B   C
# 1 2015-01-01  15  30  50
# 2 2015-02-02 200 200 300

With dcast, the LHS of the formula defines rows, the RHS defines columns, the value.var is the name of the column that becomes values, and the fun.aggregate is how those values are computed. The default fun.aggregate is length, i.e., the number of values. You asked for the average, so we use mean. You could also do min, max, sd, IQR, or any function that takes a vector and returns a single value.

Upvotes: 9

akrun
akrun

Reputation: 886938

You may also use table for the updated question

  table(dat[c(1,3)])
  #            type
  #date       A B C
  #2015-01-01 2 1 1
  #2015-02-02 2 1 2

For the first question, I think @Gregor's solution is the best (so far), a possible option with dplyr/tidyr would be

 library(dplyr)
 library(tidyr)
 dat %>%
    group_by(date,type) %>%
    summarise(val=mean(val)) %>% 
    spread(type, val)

Or a base R option would be (nchar=50 and the dcast(.. nchar=44. So not so bad :-))

  with(dat, tapply(val, list(date, type), FUN=mean))
  #            A   B   C
  #2015-01-01  15  30  50
  #2015-02-02 200 200 300

Upvotes: 8

Alex A.
Alex A.

Reputation: 5586

Personally I would go with Gregor's solution using reshape2. But for the sake of completeness I'll include a base R solution.

agg <- with(dat, aggregate(val, by = list(date = date, type = type), FUN = mean))

out <- reshape(agg, timevar = "type", idvar = "date", direction = "wide")

out
#         date x.A x.B x.C
# 1 2015-01-01  15  30  50
# 2 2015-02-02 200 200 300

If you want to get rid of the x. on the column names, you can remove it with gsub.

colnames(out) <- gsub("^x\\.", "", colnames(out))

To get the counts of rows, replace FUN = mean with FUN = length in the call to aggregate.

Upvotes: 4

Related Questions