Reputation: 640
I'm new to R and blown away by the power it has at manipulating data quickly and returning readable information. For now, though, I'm stuck.
I have a large dataset that I've imported as a data frame. I'd like to search across specific columns of the data frame using regex (grepl
?) and place the results of the search into a new column. I thought I could do this with apply
or ddply
, but I can't seem to wrap my mind around the functions well enough to do this.
Here's a sample data frame...
df <- structure(list(w = structure(c(3L, 2L, 1L, 3L, 3L), .Label = c("b",
"c", "d"), class = "factor"), x = structure(c(1L, 2L, 1L, 2L,
3L), .Label = c("a", "b", "d"), class = "factor"), y = structure(c(2L,
1L, 1L, 1L, 1L), .Label = c("a", "d"), class = "factor")), .Names = c("w",
"x", "y"), row.names = c(NA, -5L), class = "data.frame")
which returns...
w x y
1 d a d
2 c b a
3 b a a
4 d b a
5 d d a
I've tried: search <- apply(df, 2, function(x){grepl("d", x, perl=TRUE)})
(among other things), which returns:
w x y
[1,] TRUE FALSE TRUE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
[4,] TRUE FALSE FALSE
[5,] TRUE TRUE FALSE
What I'd like to have as a result is...
w x y z
1 d a d TRUE
2 c b a FALSE
3 b a a FALSE
4 d b a TRUE
5 d d a TRUE
I realize this seems very trivial to those of you who are advanced. Thanks in advance for taking the time to help me learn. Additionally, while I'm looking for an answer to this specific problem, I'd love to hear suggestions on things to study/read that will help me get a better grasp on this type of data manipulation.
Upvotes: 2
Views: 766
Reputation: 99331
You don't need regular expressions for this. You can use rowSums
.
When we use df == "d"
, the entire data frame is converted to logical values. Since FALSE
equals zero numerically, any row sum greater than zero means the row contains at least one "d"
.
> df$z <- rowSums(df == "d") > 0
> df
# w x y z
# 1 d a d TRUE
# 2 c b a FALSE
# 3 b a a FALSE
# 4 d b a TRUE
# 5 d d a TRUE
If you need to do this for several different values, you could write a function.
fun <- function(data, what) {
data$z <- rowSums(data == what) > 0
data
}
fun(df, "b")
fun(df, "d")
lapply(c("a", "b"), fun, data = df)
Another method would be to use apply
across the rows. any
is a function that returns TRUE
if any of its first argument is TRUE
df$z <- apply(df == "d", 1, any)
With regards to reference material, I believe the best place to learn R is from the people that wrote R. Check out the manuals at http://cran.r-project.org/doc/manuals/
Upvotes: 4
Reputation: 24545
Following can also be used:
df$result = apply(df, 1, function(x) any(grepl("d",x)))
df
w x y result
1 d a d TRUE
2 c b a FALSE
3 b a a FALSE
4 d b a TRUE
5 d d a TRUE
Upvotes: 3