DN1
DN1

Reputation: 218

How to sum numbers considered as strings?

I have a column in my dataset like this:

col1
1
1, 1, 1, 1
1, 1
1, 1, 1, 1, 1
1

I am trying to sum each row in a new column like this output:

col2
1
4
2
5
1

I have tried doing:

rowSums(as.numeric(as.character(df$col1)))
Error in rowSums(as.numeric(as.character(df$col1))) : 
  'x' must be an array of at least two dimensions
In addition: Warning message:
In is.data.frame(x) : NAs introduced by coercion

I am new to R and likely missing something obvious, but I can't find any similar problems online also in R to adapt to my data, any help or advice on what functions to use would be appreciated.

Data:

structure(list(col1 = c("1", "1, 1, 1, 1", "1, 1", "1, 1, 1, 1, 1", "1"), 
row.names = c(NA, -5L), class = "data.frame")

Upvotes: 2

Views: 122

Answers (5)

akrun
akrun

Reputation: 887251

We can read with read.csv and use rowSums in base R

rowSums(read.csv(text = df1$col1, fill = TRUE, header = FALSE), na.rm = TRUE)
#[1] 1 4 2 5 1

data

df1 <- structure(list(col1 = c("1", "1, 1, 1, 1", "1, 1", "1, 1, 1, 1, 1", 
 "1")), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 0

s_baldur
s_baldur

Reputation: 33488

Using stringr:

library(stringr)

# assumes we are only summing integers, ignores decimals
sapply(str_extract_all(df$col1, "[0-9]+"), function(x) sum(as.integer(x)))
[1] 1 4 2 5 1


# Assumes we are only looking for the integer 1
str_count(df$col1, "1")
[1] 1 4 2 5 1

Upvotes: 1

Clemsang
Clemsang

Reputation: 5481

You can use sapply. strsplit allows to get only digits you want, then convert them from character to numeric and sum then:

df$col2 <- sapply(strsplit(df$col1, ","), function(x) sum(as.numeric(x)))
df$col2

[1] 1 4 2 5 1

Upvotes: 4

Sotos
Sotos

Reputation: 51592

One idea is to use eval(parse) after you replace , with +, i.e.

sapply(gsub(', ', '+', d3$col1, fixed = TRUE), function(i)eval(parse(text = i)))
#        1   1+1+1+1       1+1 1+1+1+1+1         1 
#        1         4         2         5         1 

Another is to split and sum,

sapply(strsplit(d3$col1, ', '), function(i)sum(as.numeric(i)))
#[1] 1 4 2 5 1

However, If you only have 1 to sum, then you can simply count them. Using stringr,

stringr::str_count(d3$col1, '1')
[1] 1 4 2 5 1

Upvotes: 3

tmfmnk
tmfmnk

Reputation: 39858

One dplyr and tidyr option could be:

df %>%
 rowid_to_column() %>%
 separate_rows(col1, sep = ", ", convert = TRUE) %>%
 group_by(rowid) %>%
 summarise_all(sum)

  rowid  col1
  <int> <int>
1     1     1
2     2     4
3     3     2
4     4     5
5     5     1

Or a quite handy option involving splitstackshape:

rowSums(cSplit(df, "col1"), na.rm = TRUE)

Upvotes: 2

Related Questions