Aberdeen24
Aberdeen24

Reputation: 3

Check if row contains a substring, print value in another column (R programming)

Goal:

I am looking at example Twitter data and am checking to see if my data in column "Tweet" contains the string of words "yo creo." If the tweet contains "yo creo," I would like to print a "1" in the column "Subject Expression".

Error:

I am receiving the error: Must subset columns with a valid subscript vector. x Subscript has the wrong type logical. ℹ It must be numeric or character.

Here is my code:

#Read in data
MyData <-read.csv("/Users/mydata/Desktop/MyData.csv")

#Append subject expression column to dataframe
MyData$SubjectExpression <- ""

#Count instances of subject expression using select
MyData%>%
  mutate(SubjectExpression)= 
  case_when(
    select(MyData, Tweet, contains("yo creo") == '1')
  )

Upvotes: 0

Views: 34

Answers (2)

Andre Wildberg
Andre Wildberg

Reputation: 19088

A base R alternative using grepl

MyData$SubjectExpression <- grepl("yo creo", MyData$Tweet)*1

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 145765

You've got a few issues.

  • mutate syntax is data %>% mutate(column = value) - you need to keep the definition of the new column inside mutate's ().
  • Inside most dplyr functions, including mutate() you can use column names directly and unquoted. You don't need to select() a column (select() is for keeping some columns and dropping others)
  • case_when() argument syntax is test_1 ~ value_1, test_2 ~ value_2
  • contains() is specifically made for column names, to detect the presence a string in a column/vector we'll use stringr::str_detect
  • mutate() can create brand new columns. You don't need to initialize the column with MyData$SubjectExpression <- "". You should just delete that line.

Making all those changes, we get this:

MyData%>%
  mutate(SubjectExpression = 
    case_when(
      stringr::str_detect(Tweet, "yo creo") ~ 1,
      TRUE ~ 0
    )
  )

Upvotes: 0

Related Questions