Reputation: 11793
Here is my dataframe:
df <- data.frame(a = c(1:10),
b= c(11:15, NA, NaN, '', 20, 22))
a b
1 11
2 12
3 13
4 14
5 15
6 NA
7 NaN
8
9 20
10 22
what I need to do is to extract rows where the value in column b is not a number.
In this case, I need to extract rows where column a is 7,8,9. I definitely need a general solution that work for any large dataset.
I tried:
df %>% filter(!is.numeric(b))
But it does not work. I do not have any clue how to achieve that. thanks in advance for any help.
Upvotes: 1
Views: 1477
Reputation: 2496
considering data as :
df <- data.frame(a = c(1:10),
b= c(11:15, NA, NaN, '', 20, 22))
the first issue I can see is that b
is read in as factors, which can be checked by doing :
str(df)
giving us
'data.frame': 10 obs. of 2 variables:
$ a: int 1 2 3 4 5 6 7 8 9 10
$ b: Factor w/ 9 levels "","11","12","13",..: 2 3 4 5 6 NA 9 1 7 8
with this in mind, we can just tweak your existing approach to something like
df %>%
mutate( b = as.numeric(as.character(b))) %>%
filter(is.nan(b) | is.na(b))
which gives us:
a b
1 6 NA
2 7 NaN
3 8 NA
Upvotes: 2
Reputation: 274
This will leave only the rows that have numbers:
Base R:
new <- df[!is.na(as.numeric(as.character(df$b))),]
if you start at the furthest inward parentheses, it converts everything in column B to character, and then converts that to numeric. If a non-number is tried to convert to numeric, it is replaced with NA. The final piece checks if the string is an NA or not, and if it is, it filters it out. This is all base R.
Upvotes: 1