Reputation: 1122
I have a data frame with 8 columns and many many rows. I would like to remove the rows contain more than one string in column 6 and 7 and output a data frame with only one string in column 6 and 7
df:
ID Content_ID Chromosome Start Stop Reference Alternate Length
1299675221 backbone 12 99675221 99675221 GG T 0
1298583685 backbone 12 98583685 98583685 C T 0
129833474 backbone 12 9833474 9833474 C T 0
1297722695 backbone 12 97722695 97722695 A G 0
1297381269 backbone 12 97381269 97381269 T C 0
1297081605 backbone 12 97081605 97081605 G AA 0
1297058068 backbone 12 97058068 97058068 T C 0
1295891848 backbone 12 95891848 95891848 CCTT ATA 0
1294164312 backbone 12 94164312 94164312 T C 0
12940191 backbone 12 940191 940191 T C 0
desired output:
ID Content_ID Chromosome Start Stop Reference Alternate Length
1298583685 backbone 12 98583685 98583685 C T 0
129833474 backbone 12 9833474 9833474 C T 0
1297722695 backbone 12 97722695 97722695 A G 0
1297381269 backbone 12 97381269 97381269 T C 0
1297058068 backbone 12 97058068 97058068 T C 0
1294164312 backbone 12 94164312 94164312 T C 0
12940191 backbone 12 940191 940191 T C 0
Upvotes: 3
Views: 688
Reputation: 13807
Simple as this using data.table
library(data.table)
setDT(df)
df <- df[ nchar(Reference)==1 & nchar(Alternate)==1]
Upvotes: 1
Reputation: 38500
Similarly, you can paste the columns together and then keep rows where the number of characters is equal to 3, one for each column and a space.
df[nchar(paste(df$Reference, df$Alternate)) == 3,]
ID Content_ID Chromosome Start Stop Reference Alternate Length
2 1298583685 backbone 12 98583685 98583685 C T 0
3 129833474 backbone 12 9833474 9833474 C T 0
4 1297722695 backbone 12 97722695 97722695 A G 0
5 1297381269 backbone 12 97381269 97381269 T C 0
7 1297058068 backbone 12 97058068 97058068 T C 0
9 1294164312 backbone 12 94164312 94164312 T C 0
10 12940191 backbone 12 940191 940191 T C 0
Upvotes: 2
Reputation: 887138
We can loop through the columns 6 and 7 using lapply
, check whether the number of characters is 1, use Reduce
with &
to get a logical vector
by comparing the corresponding elements of the list
, use it to subset the rows of 'df'
df[Reduce(`&`, lapply(df[6:7], function(x) nchar(x)==1)),]
# ID Content_ID Chromosome Start Stop Reference Alternate Length
#2 1298583685 backbone 12 98583685 98583685 C T 0
#3 129833474 backbone 12 9833474 9833474 C T 0
#4 1297722695 backbone 12 97722695 97722695 A G 0
#5 1297381269 backbone 12 97381269 97381269 T C 0
#7 1297058068 backbone 12 97058068 97058068 T C 0
#9 1294164312 backbone 12 94164312 94164312 T C 0
#10 12940191 backbone 12 940191 940191 T C 0
Or another option is rowSums
df[!rowSums(nchar(as.matrix(df[6:7]))!=1),]
Upvotes: 3