Ryan
Ryan

Reputation: 640

Searching across several columns of a dataframe

I'm new to R and blown away by the power it has at manipulating data quickly and returning readable information. For now, though, I'm stuck.

I have a large dataset that I've imported as a data frame. I'd like to search across specific columns of the data frame using regex (grepl?) and place the results of the search into a new column. I thought I could do this with apply or ddply, but I can't seem to wrap my mind around the functions well enough to do this.

Here's a sample data frame...

df <- structure(list(w = structure(c(3L, 2L, 1L, 3L, 3L), .Label = c("b", 
"c", "d"), class = "factor"), x = structure(c(1L, 2L, 1L, 2L, 
3L), .Label = c("a", "b", "d"), class = "factor"), y = structure(c(2L, 
1L, 1L, 1L, 1L), .Label = c("a", "d"), class = "factor")), .Names = c("w", 
"x", "y"), row.names = c(NA, -5L), class = "data.frame")

which returns...

  w x y
1 d a d
2 c b a
3 b a a
4 d b a
5 d d a

I've tried: search <- apply(df, 2, function(x){grepl("d", x, perl=TRUE)}) (among other things), which returns:

         w     x     y
[1,]  TRUE FALSE  TRUE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
[4,]  TRUE FALSE FALSE
[5,]  TRUE  TRUE FALSE

What I'd like to have as a result is...

  w x y z
1 d a d TRUE
2 c b a FALSE
3 b a a FALSE
4 d b a TRUE
5 d d a TRUE

I realize this seems very trivial to those of you who are advanced. Thanks in advance for taking the time to help me learn. Additionally, while I'm looking for an answer to this specific problem, I'd love to hear suggestions on things to study/read that will help me get a better grasp on this type of data manipulation.

Upvotes: 2

Views: 766

Answers (2)

Rich Scriven
Rich Scriven

Reputation: 99331

You don't need regular expressions for this. You can use rowSums.

When we use df == "d", the entire data frame is converted to logical values. Since FALSE equals zero numerically, any row sum greater than zero means the row contains at least one "d".

> df$z <- rowSums(df == "d") > 0
> df
#   w x y     z
# 1 d a d  TRUE
# 2 c b a FALSE
# 3 b a a FALSE
# 4 d b a  TRUE
# 5 d d a  TRUE

If you need to do this for several different values, you could write a function.

fun <- function(data, what) {
    data$z <- rowSums(data == what) > 0
    data
}
fun(df, "b")
fun(df, "d")
lapply(c("a", "b"), fun, data = df)

Another method would be to use apply across the rows. any is a function that returns TRUE if any of its first argument is TRUE

df$z <- apply(df == "d", 1, any)

With regards to reference material, I believe the best place to learn R is from the people that wrote R. Check out the manuals at http://cran.r-project.org/doc/manuals/

Upvotes: 4

rnso
rnso

Reputation: 24545

Following can also be used:

df$result = apply(df, 1, function(x) any(grepl("d",x)))
df
  w x y result
1 d a d   TRUE
2 c b a  FALSE
3 b a a  FALSE
4 d b a   TRUE
5 d d a   TRUE

Upvotes: 3

Related Questions