Reputation:
I currently wish to divide a data frame into subsets for training/testing. In the data frame there are columns that contain different items, and some contain sub-items like (Aisle01, Aisle02, etc.) I am getting tripped up by filtering out a partial string in multiple columns.
Data sample:
Column1 Column2 Column3
Wall01 Wall04 45.6
Wall04 Aisle02 65.7
Aisle06 Wall01 45.0
Aisle01 Wall01 33.3
Wall01 Wall04 21.1
If my data frame (x) contains two columns that within them contain multiple version of "Aisle", I wish to filter out everything from both columns that contains "Aisle". Wondering if the line below is somewhat on the right track?
filter(x, column1 & column2 == grep(x$column1 & x$column2, "Aisle"))
Desired result:
Column1 Column2 Column3
Wall04 Aisle02 65.7
Aisle06 Wall01 45.0
Aisle01 Wall01 33.3
Thank you in advance.
Upvotes: 11
Views: 49706
Reputation: 2094
The easiest solution I can see would be this:
x <- x[grepl("Aisle", x[["column1"]]) | grepl("Aisle", x[["column2"]]), ]
Using grepl
instead of grep
produces a logical so you can use the |
operation to select your rows. Also I just wanted to quickly go over a few places in your code that may be giving you trouble.
The x$column1 & x$column2
in the beginning of your grep
statement means that the function will try to run the &
operation pairwise on each of the entries in column1
and column2
. Since these are characters and not logicals, this will produce some weird results.
In grep
the pattern
you are trying to match comes before the string you are trying to match it to, so it should be grep("Aisle", columnValue)
not the other way around. Running ?functionName
will give you the information about the function so you don't have to try and figure that out from memory.
filter
is a function for time series (ts
) objects, not data frames. I am surprised you didn't get an error by using it in this way.
Best of luck. Comment if you want anything clarified.
Upvotes: 9