Reputation: 29
Sorry for potential duplicating, but I don't really know how to formulate my request. I work on R and I would like to be able to identify data frame cells that contain a certain character only one time.
In my df
I have a column a
that contains formulas stored as strings, e.g.
# a
1 y~x1+x2
2 y~x2+x3
3 y~x1+x2+x3
4 y~x2+x4
5 y~x1+x3+x4
and I would like to keep rows which formulas in column a
have 2 explanatory variables, i.e. that only contain one "+". The idea would be to filter and to add kind of a dummy, such as the output would be like
# a b
1 y~x1+x2 1
2 y~x2+x3 1
3 y~x1+x2+x3 0
4 y~x2+x4 1
5 y~x1+x3+x4 0
Hope that's clear enough. Thanks for helping,
Val
Upvotes: 1
Views: 235
Reputation: 34441
A third base alternative assuming there is always at least two predictors in the formula.
df$b <- +(!grepl("\\+.*\\+", df$a))
df
a b
1 y~x1+x2 1
2 y~x2+x3 1
3 y~x1+x2+x3 0
4 y~x2+x4 1
5 y~x1+x3+x4 0
Upvotes: 1
Reputation: 39657
You can use gsub
with [^+]
to extract all +
and nchar
to get their number.
x$b <- +(nchar(gsub("[^+]", "", x$a)) == 1)
x
# a b
#1 y~x1+x2 1
#2 y~x2+x3 1
#3 y~x1+x2+x3 0
#4 y~x2+x4 1
#5 y~x1+x3+x4 0
Or use gregexpr
:
lapply(gregexpr("\\+", x$a), length) == 1
#[1] TRUE TRUE FALSE TRUE FALSE
Or using it with lengths
as suggested by @ThomasIsCoding:
lengths(gregexpr("\\+", x$a)) == 1
#[1] TRUE TRUE FALSE TRUE FALSE
Or using grepl
:
grepl("^[^+]*\\+[^+]*$", x$a)
#[1] TRUE TRUE FALSE TRUE FALSE
Or with strsplit
:
sapply(strsplit(x$a, ""), function(y) sum(y == "+")==1)
#[1] TRUE TRUE FALSE TRUE FALSE
Data:
x <- read.table(header=TRUE, text="a
1 y~x1+x2
2 y~x2+x3
3 y~x1+x2+x3
4 y~x2+x4
5 y~x1+x3+x4", stringsAsFactors = FALSE)
Upvotes: 3
Reputation: 101335
Another base R solution is using gregexpr
, i.e.,
df$b <- +(lengths(gregexpr("\\+",df$a))==1)
such that
> df
a b
1 y~x1+x2 1
2 y~x2+x3 1
3 y~x1+x2+x3 0
4 y~x2+x4 1
5 y~x1+x3+x4 0
DATA
df <- structure(list(a = c("y~x1+x2", "y~x2+x3", "y~x1+x2+x3", "y~x2+x4",
"y~x1+x3+x4")), class = "data.frame", row.names = c("1", "2",
"3", "4", "5"))
Upvotes: 1