Reputation: 371
I have data that looks roughly like this:
ID Months Value
1 1 500
1 2 300
1 3 250
1 4 100
2 1 500
2 2 500
2 3 300
3 1 400
And I'm trying to reformat it into short data format where it would display as
ID/Month 1 2 3 4
1 500 300 250 100
2 500 500 300
3 400
I'm currently using the following:
dcast(data, ID ~ Month, fun.aggregate=sum, value.var="Value")
The problem is that this incorrectly double-sums the last available datapoint, and my output looks like this:
ID/Month 1 2 3 4
1 500 300 250 200
2 500 500 600
3 800
I'm not exactly sure how best to troubleshoot or fix this. Unfortunately I can't share any of the data, but is there a way to bypass the "fun.aggregate" component and have it just list the numerical values from the "Value" field? I suspect this is the reason for the double-summations in the last datapoints.
Would appreciate any help!
Upvotes: 1
Views: 367
Reputation: 2849
Data:
zz <- "ID Months Value
1 1 500
1 2 300
1 3 250
1 4 100
2 1 500
2 2 500
2 3 300
3 1 400"
data <- read.table(text=zz, header = TRUE)
Using the reshape2
package:
library(reshape2)
Code:
data %>%
dcast(ID ~ Months, sum) %>%
rename("ID/Month" = "ID")
Output:
ID/Month 1 2 3 4
<int> <int> <int> <int> <int>
1 500 300 250 100
2 500 500 300 0
3 400 0 0 0
Using the tidyverse
package:
library(tidyverse)
Code:
data %>%
pivot_wider(names_from = "Months", values_from = "Value")
Output:
#> # A tibble: 3 x 5
#> ID `1` `2` `3` `4`
#> <int> <int> <int> <int> <int>
#> 1 1 500 300 250 100
#> 2 2 500 500 300 NA
#> 3 3 400 NA NA NA
Created on 2020-07-24 by the reprex package (v0.3.0)
Upvotes: 2