Reputation: 60
I have what I thought was a simple task: check if a dataframe element equals a specific value.
I have data on cereal, with columns like Brand, Manufacturer, Calories, etc.
Here's an example row:
> data[1, ]
Brand Manufacturer Calories Protein Fat Sodium Fiber Carbohydrates Sugar Potassium
1 ACCheerios G 110 2 2 180 1.5 10.5 10 70
I thought this would work:
> 'G' == data[1, ]$Manufacturer
[1] FALSE
or maybe
> 'G' %in% data[1, ]$Manufacturer
[1] FALSE
but neither do.
I've checked these previous questions: Check if value is in data frame Check whether value exist in one data frame or not
Neither seems to do what I want. The only solution I've found so far is to use unlist()
on the row first, which causes the Brand / Manufacturer to become numbers
> unlist(data[1, ])
Brand Manufacturer Calories Protein Fat Sodium Fiber Carbohydrates Sugar
1.0 1.0 110.0 2.0 2.0 180.0 1.5 10.5 10.0
Potassium
70.0
and then both solutions above work fine, as long as I use 1
instead of G
.
> 1 == unlist(data[1, ])[2]
Manufacturer
TRUE
> 1 %in% unlist(data[1, ])[2]
[1] TRUE
My thought is that this isn't working because when I simply print out data[1, ]$Manufacturer
, more than just the column is returned. Something about "Levels" is returned too:
> data[1, ]$Manufacturer
[1] G
Levels: G K Q
I've tried doing data[1, ]$Manufacturer[1]
(and various other things to try and just return the G
) to no avail.
How do I do this relatively simply task?
Upvotes: 0
Views: 1871
Reputation: 420
The most important thing here is the following:
My thought is that this isn't working because when I simply print out data[1, ]$Manufacturer, more than just the column is returned. Something about "Levels" is returned too:
If it returns with the "Levels", then it means your variable is a factor. I really recommend you visit this page or this page to learn about basic data types in R.
Create a data frame
df = data.frame(A = c("Aa", "Bb","Cc", "Dd"),B=c(1,2,3,4), C=c(5,6,7,8))
str(df)
#> 'data.frame': 4 obs. of 3 variables:
#> $ A: Factor w/ 4 levels "Aa","Bb","Cc",..: 1 2 3 4
#> $ B: num 1 2 3 4
#> $ C: num 5 6 7 8
Following the example SO post you said you tried:
"Aa" %in% df
#> [1] FALSE
any(df=="Aa")
#> [1] TRUE
Reduce("|", df=="Aa")
#> [1] TRUE
length(which(df=="Aa"))>0
#> [1] TRUE
is.element("Aa",unlist(df[1,]))
#> [1] FALSE
Moving to your original attempts:
## Search when column 1 ("A") is a *factor*
"Aa" %in% df[1,] # Checking the entire row for a CHARACTER elemtn "Aa"
#> [1] FALSE
"Aa" %in% df[1,1] # Checking the first row and first column for character element "Aa"
#> [1] TRUE
"Aa" %in% df$A # Index a speific column
#> [1] TRUE
factor("Aa") %in% df$A # Also works if we specify the search criterion as a factor
#> [1] TRUE
Search when column 1 ("A") is a factor:
# Force df$A to a character
df$A <- as.character(df$A)
str(df) #now we have a character, numeric, and numeric
#> 'data.frame': 4 obs. of 3 variables:
#> $ A: chr "Aa" "Bb" "Cc" "Dd"
#> $ B: num 1 2 3 4
#> $ C: num 5 6 7 8
You were originally trying to scan the entire row:
"Aa" %in% df[1,]
#> [1] TRUE
Try the other options again, just to check the behavior:
"Aa" %in% df[1,1] # Checking the first row and first column for character element "Aa"
#> [1] TRUE
"Aa" %in% df$A # Index a speific column
#> [1] TRUE
factor("Aa") %in% df$A # Also works if we specify the search criterion as a factor
#> [1] TRUE
Created on 2019-04-06 by the reprex package (v0.2.1)
Upvotes: 3
Reputation: 529
The style of operations you need is realized if you change data.frame into data.table:
data <- as.data.table(data)
'G' == data[1, ]$Manufacturer
[1] TRUE
'G' %in% data[1, ]$Manufacturer
[1] TRUE
or changing factors into characters:
data$Manufacturer <- as.character(data$Manufacturer)
'G' == data[1, ]$Manufacturer
[1] TRUE
'G' %in% data[1, ]$Manufacturer
[1] TRUE
Upvotes: 0
Reputation: 13309
Just do:
df[which(df=="G")]
Manufacturer
1 G
Or go for good old apply
:
apply(df,2,function(x) "G"%in%x)
Brand Manufacturer Calories Protein Fat Sodium Fiber Carbohydrates
FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Sugar Potassium
FALSE FALSE
Upvotes: 0