antonina
antonina

Reputation: 109

How to sum df when it contains characters?

I am trying to prep my data and I am stuck with one issue. Lets say I have the following data frame:

df1
Name    C1    Val1
A       a       x1
A       a       x2
A       b       x3
A       c       x4
B       d       x5
B       d       x6
...

and I want to narrow down the df to

df2
Name    C1     Val
A       a,b,c  x1+x2+x3+x4
B       d      x5+x6
...

while a is a character value and x is numeric value I have been trying using sapply, rowsum and df2<- aggregate(df1, list(df1[,1]), FUN= summary) but it just can't put the character values in a list for each Name.

Can someone help me how to receive df2?

Upvotes: 0

Views: 527

Answers (3)

JBGruber
JBGruber

Reputation: 12420

Quick and easy dplyr solution:

    library(dplyr)
library(stringr)
df1 %>%
  mutate(Val1_num = as.numeric(str_extract(Val1, "\\d+"))) %>% 
  group_by(Name) %>% 
  summarise(C1 = paste(unique(C1), collapse = ","),
            Val1 = paste(unique(Val1), collapse = "+"),
            Val1_num = sum(Val1_num))
#> # A tibble: 2 x 4
#>   Name  C1    Val1        Val1_num
#>   <chr> <chr> <chr>          <dbl>
#> 1 A     a,b,c x1+x2+x3+x4       10
#> 2 B     d     x5+x6             11

Or in base:

df2 <- aggregate(df1, list(df1[,1]), FUN = function(x) {
  if (all(grepl("\\d", x))) {
    sum(as.numeric(gsub("[^[:digit:]]", "", x)))
  } else {
    paste(unique(x), collapse = ",")
  }
}) 

df2
#>   Group.1 Name    C1 Val1
#> 1       A    A a,b,c   10
#> 2       B    B     d   11

data

df1 <- read.csv(text = "
Name,C1,Val1
A,a,x1
A,a,x2
A,b,x3
A,c,x4
B,d,x5
B,d,x6", stringsAsFactors = FALSE)

Upvotes: 0

StupidWolf
StupidWolf

Reputation: 46908

This is your df, I give it numbers 1 to 6 in Val1

df <-
structure(list(Name = structure(c(1L, 1L, 1L, 1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), C1 = structure(c(1L, 1L, 2L, 3L, 4L, 
4L), .Label = c("a", "b", "c", "d"), class = "factor"), Val1 = 1:6), row.names = c(NA, 
-6L), class = "data.frame")

We just use summarise:

df %>% 
group_by(Name) %>% 
summarise(C1=paste(unique(C1),collapse=","),Val1=sum(Val1))

# A tibble: 2 x 3
  Name  C1     Val1
  <fct> <chr> <int>
1 A     a,b,c    10
2 B     d        11

Upvotes: 0

Onyambu
Onyambu

Reputation: 79228

m <- function(x) if(is.numeric(x<- type.convert(x)))sum(x) else toString(unique(x))
aggregate(.~Name,df1,m)
  Name      C1 Val1
1    A a, b, c   10
2    B       d   11

where

df1
Name C1 Val1
1    A  a    1
2    A  a    2
3    A  b    3
4    A  c    4
5    B  d    5
6    B  d    6

Upvotes: 1

Related Questions