Reputation: 218
I have a column in my dataset like this:
col1
1
1, 1, 1, 1
1, 1
1, 1, 1, 1, 1
1
I am trying to sum each row in a new column like this output:
col2
1
4
2
5
1
I have tried doing:
rowSums(as.numeric(as.character(df$col1)))
Error in rowSums(as.numeric(as.character(df$col1))) :
'x' must be an array of at least two dimensions
In addition: Warning message:
In is.data.frame(x) : NAs introduced by coercion
I am new to R and likely missing something obvious, but I can't find any similar problems online also in R to adapt to my data, any help or advice on what functions to use would be appreciated.
Data:
structure(list(col1 = c("1", "1, 1, 1, 1", "1, 1", "1, 1, 1, 1, 1", "1"),
row.names = c(NA, -5L), class = "data.frame")
Upvotes: 2
Views: 122
Reputation: 887251
We can read with read.csv
and use rowSums
in base R
rowSums(read.csv(text = df1$col1, fill = TRUE, header = FALSE), na.rm = TRUE)
#[1] 1 4 2 5 1
df1 <- structure(list(col1 = c("1", "1, 1, 1, 1", "1, 1", "1, 1, 1, 1, 1",
"1")), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 0
Reputation: 33488
Using stringr
:
library(stringr)
# assumes we are only summing integers, ignores decimals
sapply(str_extract_all(df$col1, "[0-9]+"), function(x) sum(as.integer(x)))
[1] 1 4 2 5 1
# Assumes we are only looking for the integer 1
str_count(df$col1, "1")
[1] 1 4 2 5 1
Upvotes: 1
Reputation: 5481
You can use sapply
. strsplit
allows to get only digits you want, then convert them from character to numeric and sum then:
df$col2 <- sapply(strsplit(df$col1, ","), function(x) sum(as.numeric(x)))
df$col2
[1] 1 4 2 5 1
Upvotes: 4
Reputation: 51592
One idea is to use eval(parse)
after you replace ,
with +
, i.e.
sapply(gsub(', ', '+', d3$col1, fixed = TRUE), function(i)eval(parse(text = i)))
# 1 1+1+1+1 1+1 1+1+1+1+1 1
# 1 4 2 5 1
Another is to split and sum,
sapply(strsplit(d3$col1, ', '), function(i)sum(as.numeric(i)))
#[1] 1 4 2 5 1
However, If you only have 1
to sum, then you can simply count them. Using stringr
,
stringr::str_count(d3$col1, '1')
[1] 1 4 2 5 1
Upvotes: 3
Reputation: 39858
One dplyr
and tidyr
option could be:
df %>%
rowid_to_column() %>%
separate_rows(col1, sep = ", ", convert = TRUE) %>%
group_by(rowid) %>%
summarise_all(sum)
rowid col1
<int> <int>
1 1 1
2 2 4
3 3 2
4 4 5
5 5 1
Or a quite handy option involving splitstackshape
:
rowSums(cSplit(df, "col1"), na.rm = TRUE)
Upvotes: 2