Reputation: 336
I have a data set that has 313 columns, ~52000 rows of information. I need to remove each column that contains the word "PERMISSIONS". I've tried grep and dplyr but I can't seem to get it to work.
I've read the file in,
testSet <- read.csv("/Users/.../data.csv")
Other examples show how to remove columns by name but I don't know how to handle wildcards. Not quite sure where to go from here.
Upvotes: 7
Views: 26416
Reputation: 209
If you want to just remove columns that are named PERMISSIONS
then you can use the select function in the dplyr
package.
df <- data.frame("PERMISSIONS" = c(1,2), "Col2" = c(1,4), "Col3" = c(1,2))
PERMISSIONS Col2 Col3
1 1 1
2 4 2
df_sub <- select(df, -contains("PERMISSIONS"))
Col2 Col3
1 1
4 2
Upvotes: 16
Reputation: 2806
It looks like these answers only do part of what you want. I think this is what you're looking for. There is probably a better way to write this though.
library(data.table)
df = data.frame("PERMISSIONS" = c(1,2), "Col2" = c("PERMISSIONS","A"), "Col3" = c(1,2))
PERMISSIONS Col2 Col3
1 1 PERMISSIONS 1
2 2 A 2
df = df[,!grepl("PERMISSIONS",colnames(df))]
setDT(df)
ind = df[, lapply(.SD, function(x) grepl("PERMISSIONS", x, perl=TRUE))]
df[,which(colSums(ind) == 0), with = FALSE]
Col3
1: 1
2: 2
Upvotes: 2
Reputation: 1610
From what I could understand from the question, the OP has a data frame like this:
df <- read.table(text = '
a b c d
e f PERMISSIONS g
h i j k
PERMISSIONS l m n',
stringsAsFactors = F)
The goal is to remove every column that has any 'PERMISSIONS' entry. Assuming that there's no variability in 'PERMISSIONS', this code should work:
cols <- colSums(mapply('==', 'PERMISSIONS', df))
new.df <- df[,which(cols == 0)]
Upvotes: 6
Reputation: 887038
We can use grepl
with !
negate,
New.testSet <- testSet[!grepl("PERMISSIONS",row.names(testSet)),
!grepl("PERMISSIONS", colnames(testSet))]
Upvotes: 3
Reputation: 834
Try this,
New.testSet <- testSet[,!grepl("PERMISSIONS", colnames(testSet))]
EDIT: changed script as per comment.
Upvotes: 6