Reputation: 35
this is my first stack overflow post so please bear with me...
I wish to subset a datatable in R so that only rows containing digits are included - and rows containing characters are excluded. For example the table
Date Temperature
41941.6656 1921
41941.6656 1921
41941.66561 1921
41941.66563 1921
41941.66564 1921
41941.pypito 1921
41941.66566 xWRET
41941.66567 1921
should become
Date Temperature
41941.6656 1921
41941.6656 1921
41941.66561 1921
41941.66563 1921
41941.66564 1921
41941.66567 1921
where xWRET represents some random character string that my stupid instrument often
I have tried is.numeric()
and grep()
however I cant get either to work.
I feel this should be very straightforward!
Thanks!
Upvotes: 2
Views: 1655
Reputation: 5856
If one is not yet familiar with data.table and the relative complexity its idiom, you could use base functions as well.
Assuming you are getting your data.frame columns as character.
df$Date <- as.numeric(df$Date)
df$Temperature <- as.numeric(df$Temperature)
or
df1 <- sapply(df, as.numeric)
and finally
df[complete.cases(df), ]
df1[complete.cases(df1), ]
to get
Date Temperature
[1,] 41941.67 1921
[2,] 41941.67 1921
[3,] 41941.67 1921
[4,] 41941.67 1921
[5,] 41941.67 1921
[6,] 41941.67 1921
more on intro to data.table
here
Upvotes: 2
Reputation: 92300
I think the more general and efficient usage of data.table
would be using .SD
and lapply
(in case you want to convert more than just two columns and you don't want to specify them manually)
Either
library(data.table)
na.omit(setDT(df)[, lapply(.SD, function(x) as.numeric(as.character(x)))])
# Date Temperature
# 1: 41941.67 1921
# 2: 41941.67 1921
# 3: 41941.67 1921
# 4: 41941.67 1921
# 5: 41941.67 1921
# 6: 41941.67 1921
Or
setDT(df)[, names(df) := lapply(.SD, function(x) as.numeric(as.character(x)))][complete.cases(df)]
# Date Temperature
# 1: 41941.67 1921
# 2: 41941.67 1921
# 3: 41941.67 1921
# 4: 41941.67 1921
# 5: 41941.67 1921
# 6: 41941.67 1921
Upvotes: 4
Reputation: 3297
This is most certainly not the best way to do it, but here it goes:
library(data.table) # use this package, it is great for perfomance
lines="
41941.6656 1921
41941.6656 1921
41941.66561 1921
41941.66563 1921
41941.66564 1921
41941.pypito 1921
41941.66566 xWRET
41941.66567 1921"
con <- textConnection(lines)
d = data.table(read.table(con,stringsAsFactors = FALSE,
sep=" ",
col.names=c("Date", "Temperature"),
fill=FALSE,
strip.white=TRUE))
close(con)
d<-d[!is.na(as.numeric(Temperature)) & !is.na(as.numeric(substr(Date,start=7,stop=7)))]
This to me, returns the following:
Date Temperature
1: 41941.6656 1921
2: 41941.6656 1921
3: 41941.66561 1921
4: 41941.66563 1921
5: 41941.66564 1921
6: 41941.66567 1921
Is this what you want?
Upvotes: 0