Reputation: 719
I am working on a data frame ('df_temp') with two columns customer id ('Custid') and income ('Income'):
Custid Income
<fctr> <dbl>
1 1003 29761.20
2 1004 98249.55
3 1006 23505.30
4 1007 72959.25
5 1009 114973.95
6 1010 25038.30
While checking if Income is numeric, I am facing the following problem:
Using $ to refer to Income, returns TRUE:
> is.numeric(df_temp$Income)
[1] TRUE
Using [,2] or [,which(...)] to refer to Income, returns FALSE:
> i <- which(names(df_temp)=='Income')
> is.numeric(df_temp[,i])
[1] FALSE
> is.numeric(df_temp[,2])
[1] FALSE
When trying to set this vector to numerical using [,], I run into another issue:
> df_temp[,2] <- as.numeric(df_temp[,2])
Error: (list) object cannot be coerced to type 'double'
I always thought that $ and [] serve the same purpose when referring to a vector in a data frame.
Could somebody please help me understanding the problem and converting this vector into numerical, using the [,] expression?
Upvotes: 3
Views: 146
Reputation: 307
To fully answer the question, $ and [ do serve the same purpose on a standard data.frame object:
Custid <- c(1003, 1004, 1006, 1007, 1009, 1010)
Income <- c(29761.20, 98249.55, 23505.30, 72959.25, 114973.95, 25038.30)
mydf <- data.frame(Custid, Income)
class(mydf$Income); class(mydf[ , 2])
You're dealing with a tbl_df
object:
library(dplyr)
mytbl_df <- tbl_df(mydf)
print(mytbl_df)
## A tibble: 6 × 2
# Custid Income
# <dbl> <dbl>
#1 1003 29761.20
#2 1004 98249.55
#3 1006 23505.30
#4 1007 72959.25
#5 1009 114973.95
#6 1010 25038.30
To get [ to work as usual on mytbl_df, just convert it back into a data.frame: newdf <- as.data.frame(mytbl_df)
.
Upvotes: 3
Reputation: 887118
We have a tbl_df
object, so extracting using [
still is a tbl_df
i.e.
df_temp[,i]
# A tibble: 6 × 1
# Income
# <dbl>
#1 29761.20
#2 98249.55
#3 23505.30
#4 72959.25
#5 114973.95
#6 25038.30
We can do the [[
extraction
df_temp[[i]]
#[1] 29761.20 98249.55 23505.30 72959.25 114973.95 25038.30
is.numeric(df_temp[[i]])
#[1] TRUE
df_temp <- structure(list(Custid = c(1003L, 1004L, 1006L, 1007L, 1009L,
1010L), Income = c(29761.2, 98249.55, 23505.3, 72959.25, 114973.95,
25038.3)), .Names = c("Custid", "Income"), row.names = c("1",
"2", "3", "4", "5", "6"), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 1
Reputation: 176648
You're not working with a data.frame. You're working with a "tbl_df". Subsetting a tbl_df using $
returns a vector. Subsetting a tbl_df using [
returns a tbl_df, and a tbl_df is not a numeric vector, so is.numeric
returns FALSE
.
One thing tbl_df does is uses drop = FALSE
when calling [
. But it goes even further by actively preventing you from setting drop = TRUE
:
x <- tbl_df(mtcars)
is.numeric(x[,"cyl",drop=TRUE])
# [1] FALSE
Warning messages:
1: drop ignored
So, you cannot use [
with a tbl_df in the way you want. You have to use $
or [[
to extract the vector.
is.numeric(x$cyl)
# [1] TRUE
is.numeric(x[["cyl"]])
# [1] TRUE
Upvotes: 10