Check if row contains a substring, print value in another column (R programming)

Question

Goal:

I am looking at example Twitter data and am checking to see if my data in column "Tweet" contains the string of words "yo creo." If the tweet contains "yo creo," I would like to print a "1" in the column "Subject Expression".

Error:

I am receiving the error: Must subset columns with a valid subscript vector. x Subscript has the wrong type logical. ℹ It must be numeric or character.

Here is my code:

#Read in data
MyData <-read.csv("/Users/mydata/Desktop/MyData.csv")

#Append subject expression column to dataframe
MyData$SubjectExpression <- ""

#Count instances of subject expression using select
MyData%>%
  mutate(SubjectExpression)= 
  case_when(
    select(MyData, Tweet, contains("yo creo") == '1')
  )

Gregor Thomas · Accepted Answer

You've got a few issues.

mutate syntax is data %>% mutate(column = value) - you need to keep the definition of the new column inside mutate's ().
Inside most dplyr functions, including mutate() you can use column names directly and unquoted. You don't need to select() a column (select() is for keeping some columns and dropping others)
case_when() argument syntax is test_1 ~ value_1, test_2 ~ value_2
contains() is specifically made for column names, to detect the presence a string in a column/vector we'll use stringr::str_detect
mutate() can create brand new columns. You don't need to initialize the column with MyData$SubjectExpression <- "". You should just delete that line.

Making all those changes, we get this:

MyData%>%
  mutate(SubjectExpression = 
    case_when(
      stringr::str_detect(Tweet, "yo creo") ~ 1,
      TRUE ~ 0
    )
  )

Check if row contains a substring, print value in another column (R programming)

Answers (2)

Related Questions