Reputation: 1177
I want to summarise my data small
for each different video.id using dplyr
.
small %>%
group_by(Video.ID) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = mean(Category))
mean(Category) is clearly the wrong approach. How do I get it just to use the value that is repeated several times (one video.id has always the same category no matter how often it appears in the dataframe).
My dataframe looks like this :
small
# A tibble: 6 x 7
X1 X1_1 Video.ID Video.Duration..sec. Category Owned.Views Partner.Revenue
<int> <int> <chr> <int> <chr> <int> <dbl>
1 1 1 ---0zh9uzSE 1184 gadgets 6 0
2 2 2 ---0zh9uzSE 1184 gadgets 6 0
3 3 3 ---0zh9uzSE 1184 gadgets 2 0
4 4 4 ---0zh9uzSE 1184 gadgets 1 0
5 5 5 ---0zh9uzSE 1184 gadgets 1 0
6 6 6 ---0zh9uzSE 1184 gadgets 3 0
small <-
structure(list(X1 = 1:6,
X1_1 = 1:6,
Video.ID = c("---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE", "---0zh9uzSE"),
Video.Duration..sec. = c(1184L, 1184L, 1184L, 1184L, 1184L, 1184L),
Category = c("gadgets", "gadgets", "gadgets", "gadgets", "gadgets", "gadgets"),
Owned.Views = c(6L, 6L, 2L, 1L, 1L, 3L),
Partner.Revenue = c(0, 0, 0, 0, 0, 0)),
row.names = c(NA, -6L),
class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 5
Views: 9155
Reputation: 6222
Since it is a unique category for each video_id
, you can have cat = Category[1]
, as in
small %>% group_by(Video.ID) %>%
summarise(sumr=sum(Partner.Revenue), len = mean(Video.Duration..sec.),
cat = Category[1])
# A tibble: 1 x 4
# Video.ID sumr len cat
# <chr> <dbl> <dbl> <chr>
# 1 ---0zh9uzSE 0 1184 gadgets
Upvotes: 1
Reputation: 7724
You have at least two options to solve this:
Add the Category column to your group_by
:
small %>%
group_by(Video.ID, cat = Category) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.))
# A tibble: 1 x 4
# Groups: Video.ID [?]
# Video.ID cat sumr len
# <chr> <chr> <dbl> <dbl>
# 1 ---0zh9uzSE gadgets 0 1184
Or use unique(Catregory)
:
small %>%
group_by(Video.ID) %>%
summarise(sumr = sum(Partner.Revenue),
len = mean(Video.Duration..sec.),
cat = unique(Category))
# A tibble: 1 x 4
# Video.ID sumr len cat
# <chr> <dbl> <dbl> <chr>
# 1 ---0zh9uzSE 0 1184 gadgets
The first option, might be perferred, because it still works if you have multiple categories per id.
Upvotes: 6