Zlo
Zlo

Reputation: 1170

dplyr summarize by string

I have a dataframe that has numeric and string values, for example:

 mydf <- data.frame(id = c(1, 2, 1, 2, 3, 4),
               value = c(32, 12, 43, 6, 50, 20),
               text = c('A', 'B', 'A', 'B', 'C', 'D'))

The value of id variable always corresponds to text variable, e.g., id == 1 will always be text == 'A'.

Now, I want to summarize this dataframe by id (or by text, since it's the same thing):

mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value))

This works nicely, but I also need the text variable, since I wan t to do text analysis.

However, when I add text to the dplyr pipe:

mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value),
  text = text)

I get the following error:

Error: expecting a single value

Since text for id is always the same, is it possible to append it to the summarized dataframe?

Upvotes: 2

Views: 11625

Answers (2)

CSV
CSV

Reputation: 849

Instead of summarise, which would make your df into a data frame with only two columns, use mutate so that you can keep other variables.

mydf %>%
group_by(id) %>%
mutate(mean_value = mean(value))

Upvotes: 0

zx8754
zx8754

Reputation: 56004

summarize function needs to apply some functions on input, so we can either keep text out of it and keep together with id within group_by, or use first function within summarize:

# text should be in group_by to show up in result
mydf %>%
  group_by(id, text) %>%
  summarize(mean_value = mean(value))

# or within summarise use first function, to take the first value when grouped
mydf %>%
  group_by(id) %>%
  summarize(mean_value = mean(value),
            text = first(text))

Upvotes: 5

Related Questions