Adrienne
Adrienne

Reputation: 11

How to exclude (drop) cell values that are numbers in a character column

I have a large data set with one column that includes both characters (i.e. "A", "B", etc) and numbers, but the numbers are read-in and assumed to be characters as well. I want to get rid of all rows where the cell for this column is a number. For simplicity, I will show just a mock vector representing the issue I am having with the column.

For example,

data<-c("A","A","B","B","1","2","-2")

This is data I inherited and a large data set - is there a good way to parse out/drop the cells with the numbers 1,2,-2 which are read-in as characters?

Thanks for the help.

Upvotes: 0

Views: 398

Answers (2)

talat
talat

Reputation: 70326

A simple option would be:

data <- droplevels(data[is.na(suppressWarnings(as.numeric(data$col))), ])

Convert the column (col) to numeric and subset those values that turned to NA (which means that they are not numbers). Then, drop factor levels that are no longer in use.

Some example usages:

v1 <- c('A12', 'AB12', '-2.53', '25.29', 'BCd')
v1[is.na(suppressWarnings(as.numeric(v1)))]
#[1] "A12"  "AB12" "BCd"

Or with special characters:

v1 <- c('A_12', 'AB12', '-2.53', '25.29', 'B-Cd')
v1[is.na(suppressWarnings(as.numeric(v1)))]
#[1] "A_12" "AB12" "B-Cd"

Upvotes: 1

akrun
akrun

Reputation: 887711

One simple regex option is below. Here, I am subsetting the dataset using grepl by removing those elements that have numbers starting from beginning (^) to end ($) of the string.

subdat <- droplevels(data[!grepl('^[0-9.-]+$', data$yourCol),])

Visualization

^[0-9.-]+$

Regular expression visualization

Debuggex Demo

If the column is factor, you can use droplevels to drop the levels or can use factor again to drop the "unused" levels. Then, check "yourCol" of "data" by levels(data$yourCol). Another option is to convert to "character" column by data$yourCol <- as.character(data$yourCol) and use unique(data$yourCol)

Testing with some example data

 v1 <- c('A12', 'AB12', '-2.53', '25.29', 'BCd', '-12AB5', '-AB125', '- ')
 v1[!grepl('^[0-9.-]+$', v1)]
 #[1] "A12"    "AB12"   "BCd"    "-12AB5" "-AB125" "- "    

Doublechecking with @docendodiscimus code

 v1[is.na(suppressWarnings(as.numeric(v1)))]
 #[1] "A12"    "AB12"   "BCd"    "-12AB5" "-AB125" "- "    

NOTE: I did update the regex after finding that the initial one may not work in some cases.

Upvotes: 0

Related Questions