Luop90
Luop90

Reputation: 60

Check if string equals dataframe element

I have what I thought was a simple task: check if a dataframe element equals a specific value.

I have data on cereal, with columns like Brand, Manufacturer, Calories, etc.

Here's an example row:

> data[1, ]
       Brand Manufacturer Calories Protein Fat Sodium Fiber Carbohydrates Sugar Potassium
1 ACCheerios            G      110       2   2    180   1.5          10.5    10        70

I thought this would work:

> 'G' == data[1, ]$Manufacturer
[1] FALSE

or maybe

> 'G' %in% data[1, ]$Manufacturer
[1] FALSE

but neither do.

I've checked these previous questions: Check if value is in data frame Check whether value exist in one data frame or not

Neither seems to do what I want. The only solution I've found so far is to use unlist() on the row first, which causes the Brand / Manufacturer to become numbers

> unlist(data[1, ])
        Brand  Manufacturer      Calories       Protein           Fat        Sodium         Fiber Carbohydrates         Sugar 
          1.0           1.0         110.0           2.0           2.0         180.0           1.5          10.5          10.0 
    Potassium 
         70.0 

and then both solutions above work fine, as long as I use 1 instead of G.

> 1 == unlist(data[1, ])[2]
Manufacturer 
        TRUE 

> 1 %in% unlist(data[1, ])[2]
[1] TRUE

My thought is that this isn't working because when I simply print out data[1, ]$Manufacturer, more than just the column is returned. Something about "Levels" is returned too:

> data[1, ]$Manufacturer
[1]  G
Levels:  G  K  Q

I've tried doing data[1, ]$Manufacturer[1] (and various other things to try and just return the G) to no avail.

How do I do this relatively simply task?

Upvotes: 0

Views: 1871

Answers (3)

Jessica Burnett
Jessica Burnett

Reputation: 420


The most important thing here is the following:

My thought is that this isn't working because when I simply print out data[1, ]$Manufacturer, more than just the column is returned. Something about "Levels" is returned too:

If it returns with the "Levels", then it means your variable is a factor. I really recommend you visit this page or this page to learn about basic data types in R.

Create a data frame

df = data.frame(A = c("Aa", "Bb","Cc", "Dd"),B=c(1,2,3,4), C=c(5,6,7,8))
str(df) 
#> 'data.frame':    4 obs. of  3 variables:
#>  $ A: Factor w/ 4 levels "Aa","Bb","Cc",..: 1 2 3 4
#>  $ B: num  1 2 3 4
#>  $ C: num  5 6 7 8

Following the example SO post you said you tried:

"Aa" %in% df
#> [1] FALSE
any(df=="Aa") 
#> [1] TRUE
Reduce("|", df=="Aa") 
#> [1] TRUE
length(which(df=="Aa"))>0 
#> [1] TRUE
is.element("Aa",unlist(df[1,])) 
#> [1] FALSE

Moving to your original attempts:

## Search when column 1 ("A") is a *factor*
"Aa" %in% df[1,] # Checking the entire row for a CHARACTER elemtn "Aa"
#> [1] FALSE
"Aa" %in% df[1,1] # Checking the first row and first column for character element "Aa"
#> [1] TRUE
"Aa" %in% df$A # Index a speific column
#> [1] TRUE
factor("Aa") %in% df$A # Also works if we specify the search criterion as a factor
#> [1] TRUE

Search when column 1 ("A") is a factor:

# Force df$A to a character 
df$A <- as.character(df$A)
str(df) #now we have a character, numeric, and numeric
#> 'data.frame':    4 obs. of  3 variables:
#>  $ A: chr  "Aa" "Bb" "Cc" "Dd"
#>  $ B: num  1 2 3 4
#>  $ C: num  5 6 7 8

You were originally trying to scan the entire row:

"Aa" %in% df[1,] 
#> [1] TRUE

Try the other options again, just to check the behavior:

"Aa" %in% df[1,1] # Checking the first row and first column for character element "Aa"
#> [1] TRUE
"Aa" %in% df$A # Index a speific column
#> [1] TRUE
factor("Aa") %in% df$A # Also works if we specify the search criterion as a factor
#> [1] TRUE

Created on 2019-04-06 by the reprex package (v0.2.1)

Upvotes: 3

Grzegorz Sionkowski
Grzegorz Sionkowski

Reputation: 529

The style of operations you need is realized if you change data.frame into data.table:

data <- as.data.table(data)
'G' == data[1, ]$Manufacturer
[1] TRUE
'G' %in% data[1, ]$Manufacturer
[1] TRUE

or changing factors into characters:

data$Manufacturer <- as.character(data$Manufacturer)
 'G' == data[1, ]$Manufacturer
[1] TRUE
'G' %in% data[1, ]$Manufacturer
[1] TRUE

Upvotes: 0

NelsonGon
NelsonGon

Reputation: 13309

Just do:

df[which(df=="G")]
  Manufacturer
1            G

Or go for good old apply:

apply(df,2,function(x) "G"%in%x)
        Brand  Manufacturer      Calories       Protein           Fat        Sodium         Fiber Carbohydrates 
        FALSE          TRUE         FALSE         FALSE         FALSE         FALSE         FALSE         FALSE 
        Sugar     Potassium 
    FALSE         FALSE 

Upvotes: 0

Related Questions