Jan-Benedikt Jagusch
Jan-Benedikt Jagusch

Reputation: 719

Difference in Logical Statement between [,] and $

I am working on a data frame ('df_temp') with two columns customer id ('Custid') and income ('Income'):

  Custid    Income
  <fctr>     <dbl>
1   1003  29761.20
2   1004  98249.55
3   1006  23505.30
4   1007  72959.25
5   1009 114973.95
6   1010  25038.30

While checking if Income is numeric, I am facing the following problem:

Using $ to refer to Income, returns TRUE:

> is.numeric(df_temp$Income)
[1] TRUE

Using [,2] or [,which(...)] to refer to Income, returns FALSE:

> i <- which(names(df_temp)=='Income')
> is.numeric(df_temp[,i])
[1] FALSE
> is.numeric(df_temp[,2])
[1] FALSE

When trying to set this vector to numerical using [,], I run into another issue:

> df_temp[,2] <- as.numeric(df_temp[,2])
Error: (list) object cannot be coerced to type 'double'

I always thought that $ and [] serve the same purpose when referring to a vector in a data frame.

Could somebody please help me understanding the problem and converting this vector into numerical, using the [,] expression?

Upvotes: 3

Views: 146

Answers (3)

John Palowitch
John Palowitch

Reputation: 307

To fully answer the question, $ and [ do serve the same purpose on a standard data.frame object:

Custid <- c(1003, 1004, 1006, 1007, 1009, 1010)
Income <- c(29761.20, 98249.55, 23505.30, 72959.25, 114973.95, 25038.30)
mydf <- data.frame(Custid, Income)
class(mydf$Income); class(mydf[ , 2])

You're dealing with a tbl_df object:

library(dplyr)
mytbl_df <- tbl_df(mydf)
print(mytbl_df)
## A tibble: 6 × 2
#  Custid    Income
#   <dbl>     <dbl>
#1   1003  29761.20
#2   1004  98249.55
#3   1006  23505.30
#4   1007  72959.25
#5   1009 114973.95
#6   1010  25038.30

To get [ to work as usual on mytbl_df, just convert it back into a data.frame: newdf <- as.data.frame(mytbl_df).

Upvotes: 3

akrun
akrun

Reputation: 887118

We have a tbl_df object, so extracting using [ still is a tbl_df i.e.

df_temp[,i]
# A tibble: 6 × 1
#     Income
#      <dbl>
#1  29761.20
#2  98249.55
#3  23505.30
#4  72959.25
#5 114973.95
#6  25038.30

We can do the [[ extraction

df_temp[[i]]
#[1]  29761.20  98249.55  23505.30  72959.25 114973.95  25038.30


is.numeric(df_temp[[i]])
#[1] TRUE

data

df_temp <- structure(list(Custid = c(1003L, 1004L, 1006L, 1007L, 1009L, 
1010L), Income = c(29761.2, 98249.55, 23505.3, 72959.25, 114973.95, 
25038.3)), .Names = c("Custid", "Income"), row.names = c("1", 
"2", "3", "4", "5", "6"), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Joshua Ulrich
Joshua Ulrich

Reputation: 176648

You're not working with a data.frame. You're working with a "tbl_df". Subsetting a tbl_df using $ returns a vector. Subsetting a tbl_df using [ returns a tbl_df, and a tbl_df is not a numeric vector, so is.numeric returns FALSE.

One thing tbl_df does is uses drop = FALSE when calling [. But it goes even further by actively preventing you from setting drop = TRUE:

x <- tbl_df(mtcars)
is.numeric(x[,"cyl",drop=TRUE])
# [1] FALSE
Warning messages:
1: drop ignored 

So, you cannot use [ with a tbl_df in the way you want. You have to use $ or [[ to extract the vector.

is.numeric(x$cyl)
# [1] TRUE
is.numeric(x[["cyl"]])
# [1] TRUE

Upvotes: 10

Related Questions