zesla
zesla

Reputation: 11793

how to extract rows where a specific column is not a number in dplyr

Here is my dataframe:

df <- data.frame(a = c(1:10),
                 b= c(11:15, NA, NaN, '', 20, 22))

a   b
1   11          
2   12          
3   13          
4   14          
5   15          
6   NA          
7   NaN         
8               
9   20          
10  22

what I need to do is to extract rows where the value in column b is not a number. In this case, I need to extract rows where column a is 7,8,9. I definitely need a general solution that work for any large dataset.
I tried:

df %>% filter(!is.numeric(b))

But it does not work. I do not have any clue how to achieve that. thanks in advance for any help.

Upvotes: 1

Views: 1477

Answers (2)

Aramis7d
Aramis7d

Reputation: 2496

considering data as :

df <- data.frame(a = c(1:10),
                 b= c(11:15, NA, NaN, '', 20, 22))

the first issue I can see is that b is read in as factors, which can be checked by doing :

str(df)

giving us

'data.frame':   10 obs. of  2 variables:
 $ a: int  1 2 3 4 5 6 7 8 9 10
 $ b: Factor w/ 9 levels "","11","12","13",..: 2 3 4 5 6 NA 9 1 7 8

with this in mind, we can just tweak your existing approach to something like

df %>% 
  mutate( b = as.numeric(as.character(b))) %>%
  filter(is.nan(b) | is.na(b)) 

which gives us:

  a   b
1 6  NA
2 7 NaN
3 8  NA

Upvotes: 2

leeum
leeum

Reputation: 274

This will leave only the rows that have numbers:

Base R:

new <- df[!is.na(as.numeric(as.character(df$b))),]

if you start at the furthest inward parentheses, it converts everything in column B to character, and then converts that to numeric. If a non-number is tried to convert to numeric, it is replaced with NA. The final piece checks if the string is an NA or not, and if it is, it filters it out. This is all base R.

Upvotes: 1

Related Questions