Reputation: 63

Subset rows of a data frame that contain numbers in all of the column

I want to get a subset of my dataframe by keeping rows that have numeric in all columns so

>small
     0    16h    24h    48h
ID1  1    0      0   
ID2  453  254    21     12  
ID3  true  3     2      1
ID4  65    23    12     12

would be

>small_numeric
     0    16h    24h    48h  
ID2  453  254    21     12  
ID4  65    23    12     1

I tried

sapply(small, is.numeric)

but got this

0      16h    24h    48h   
FALSE  FALSE  FALSE  FALSE

Upvotes: 2

Answers (1)

Jaap

Reputation: 83215

Using:

small[!rowSums(is.na(sapply(small, as.numeric))),]

gives:

      0 16h 24h 48h
ID2 453 254  21  12
ID4  65  23  12  12

What this does:

With sapply(small, as.numeric) you force all columns to numeric. Non-numeric values are converted to NA-values as a result.
Next you count the number of NA-values with rowSums(is.na(sapply(small, as.numeric))) which gives you back a numeric vector, [1] 1 0 1 0, with the number of non-numeric values by row.
Negating this with ! gives you a logical vector of the rows where all columns have numeric values.

Used data:

small <- read.table(text="     0    16h    24h    48h
ID1  1    0      0     
ID2  453  254    21     12  
ID3  true  3     2      1
ID4  65    23    12     12", header=TRUE, stringsAsFactors = FALSE, fill = TRUE, check.names = FALSE)

For the updated example data, the problem is that columns with non-numeric values are factors instead of character. There you'll have to adapt the above code as follows:

testdata[!rowSums(is.na(sapply(testdata[-1], function(x) as.numeric(as.character(x))))),]

which gives:

      0  16h  24h  48h   NA
ID2 ID2   46   23   23   48
ID3 ID3   44   10   14   22
ID4 ID4   17   11    4   24
ID5 ID5   13    5    3   18
ID7 ID7 4387 4216 2992 3744

Extra explanation:

When converting factor-columns to numeric, you will have to convert those to character first. Hence: as.numeric(as.character(x)). If you don't do that, as.numeric with give back the numbers of the factor levels.
I used testdata[-1] as I supposed that you didn't want to include the first column in the check for numeric values.

Upvotes: 5

Subset rows of a data frame that contain numbers in all of the column

Answers (1)

Related Questions