dplyr summarize by string

Question

I have a dataframe that has numeric and string values, for example:

 mydf <- data.frame(id = c(1, 2, 1, 2, 3, 4),
               value = c(32, 12, 43, 6, 50, 20),
               text = c('A', 'B', 'A', 'B', 'C', 'D'))

The value of id variable always corresponds to text variable, e.g., id == 1 will always be text == 'A'.

Now, I want to summarize this dataframe by id (or by text, since it's the same thing):

mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value))

This works nicely, but I also need the text variable, since I wan t to do text analysis.

However, when I add text to the dplyr pipe:

mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value),
  text = text)

I get the following error:

Error: expecting a single value

Since text for id is always the same, is it possible to append it to the summarized dataframe?

zx8754 · Accepted Answer

summarize function needs to apply some functions on input, so we can either keep text out of it and keep together with id within group_by, or use first function within summarize:

# text should be in group_by to show up in result
mydf %>%
  group_by(id, text) %>%
  summarize(mean_value = mean(value))

# or within summarise use first function, to take the first value when grouped
mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value),
            text = first(text))

dplyr summarize by string

Answers (2)

Related Questions