Reputation: 67
I have the following dataframe:
location asset_status count row
<chr> <chr> <dbl> <int>
1 location1 Owned 1 1
2 location1 Available 1 2
3 location1 Owned 1 3
4 location2 Owned 1 4
5 location2 Owned 1 5
6 location2 Owned 1 6
7 location2 Owned 1 7
8 location2 no status 1 8
9 location3 Owned 1 9
10 location3 Owned 1 10
When I try to spread using this, i get the following error:
df <- head(us_can_laptops,10) %>%
select(location,asset_status,count) %>%
#mutate(row = row_number()) %>% #excluded
group_by(location) %>%
spread(asset_status,count)
Error: Duplicate identifiers for rows (4, 5, 6, 7), (1, 3)
So as per the other questions related to this on SO, I added in a unique identifier with mutate:
df <- head(us_can_laptops,10) %>%
select(location,asset_status,count) %>%
mutate(row = row_number()) %>%
group_by(location) %>%
spread(asset_status,count)
But that returns this:
location row Available `no status` Owned
* <chr> <int> <dbl> <dbl> <dbl>
1 location2 4 NA NA 1
2 location2 5 NA NA 1
3 location2 6 NA NA 1
4 location2 7 NA NA 1
5 location2 8 NA 1 NA
6 location3 10 NA NA 1
7 location3 9 NA NA 1
8 location1 1 NA NA 1
9 location1 2 1 NA NA
10 location1 3 NA NA 1
And additionally, whenever I try a summarize call, it ruins my spread.
Here is the desired outcome:
location Available `no status` Owned
* <chr> <dbl> <dbl> <dbl>
1 location1 1 NA 2
2 location2 NA 1 4
3 location3 NA NA 2
Any help would be appreciated. I know this looks like a duplicate, but the answers to the following linked questions still don't resolve the issue for me: Spread function Error: Duplicate identifiers for rows [duplicate] Spread with duplicate identifiers for rows 1
I'm really looking for a solution while using dplyr, and not dcast
Upvotes: 0
Views: 75
Reputation: 2101
The following should work (at least gives desired output):
df <- structure(list(location = c("location1", "location1", "location1",
"location2", "location2", "location2", "location2", "location2",
"location3", "location3"), asset_status = c("Owned", "Available",
"Owned", "Owned", "Owned", "Owned", "Owned", "no status", "Owned",
"Owned"), count = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
row = 1:10), row.names = c(NA, -10L), .Names = c("location",
"asset_status", "count", "row"), class = "data.frame")
library(dplyr)
library(tidyr)
df %>%
group_by(location, asset_status) %>%
summarise(count = sum(count)) %>%
spread(key = asset_status, value = count)
Upvotes: 2