Nikitau
Nikitau

Reputation: 371

dcast fun.aggregate=sum in R summing last available column incorrectly

I have data that looks roughly like this:

ID Months Value
1  1       500
1  2       300
1  3       250
1  4       100
2  1       500
2  2       500
2  3       300
3  1       400

And I'm trying to reformat it into short data format where it would display as

ID/Month 1   2    3   4
1        500 300  250 100
2        500 500  300 
3        400

I'm currently using the following:

dcast(data, ID ~ Month, fun.aggregate=sum, value.var="Value")

The problem is that this incorrectly double-sums the last available datapoint, and my output looks like this:

ID/Month 1   2    3   4
1        500 300  250 200
2        500 500  600 
3        800

I'm not exactly sure how best to troubleshoot or fix this. Unfortunately I can't share any of the data, but is there a way to bypass the "fun.aggregate" component and have it just list the numerical values from the "Value" field? I suspect this is the reason for the double-summations in the last datapoints.

Would appreciate any help!

Upvotes: 1

Views: 367

Answers (1)

Eric
Eric

Reputation: 2849

Data:

zz <- "ID Months Value
       1  1       500
       1  2       300
       1  3       250
       1  4       100
       2  1       500
       2  2       500
       2  3       300
       3  1       400"

data <- read.table(text=zz, header = TRUE)

Using the reshape2 package:

library(reshape2)

Code:

data %>% 
  dcast(ID ~ Months, sum) %>% 
  rename("ID/Month" = "ID")

Output:

ID/Month   1         2          3         4
<int>      <int>     <int>      <int>     <int>
1          500       300        250       100
2          500       500        300       0
3          400       0          0         0

Using the tidyverse package:

library(tidyverse)

Code:

data %>% 
  pivot_wider(names_from = "Months", values_from = "Value")

Output:

#> # A tibble: 3 x 5
#>      ID   `1`   `2`   `3`   `4`
#>   <int> <int> <int> <int> <int>
#> 1     1   500   300   250   100
#> 2     2   500   500   300    NA
#> 3     3   400    NA    NA    NA

Created on 2020-07-24 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions