Reputation: 3
Goal:
I am looking at example Twitter data and am checking to see if my data in column "Tweet"
contains the string of words "yo creo."
If the tweet contains "yo creo,"
I would like to print a "1" in the column "Subject Expression"
.
Error:
I am receiving the error: Must subset columns with a valid subscript vector.
x Subscript has the wrong type logical
.
ℹ It must be numeric or character.
Here is my code:
#Read in data
MyData <-read.csv("/Users/mydata/Desktop/MyData.csv")
#Append subject expression column to dataframe
MyData$SubjectExpression <- ""
#Count instances of subject expression using select
MyData%>%
mutate(SubjectExpression)=
case_when(
select(MyData, Tweet, contains("yo creo") == '1')
)
Upvotes: 0
Views: 34
Reputation: 19088
A base R alternative using grepl
MyData$SubjectExpression <- grepl("yo creo", MyData$Tweet)*1
Upvotes: 1
Reputation: 145765
You've got a few issues.
mutate
syntax is data %>% mutate(column = value)
- you need to keep the definition of the new column inside mutate's ()
.dplyr
functions, including mutate()
you can use column names directly and unquoted. You don't need to select()
a column (select()
is for keeping some columns and dropping others)case_when()
argument syntax is test_1 ~ value_1, test_2 ~ value_2
contains()
is specifically made for column names, to detect the presence a string in a column/vector we'll use stringr::str_detect
mutate()
can create brand new columns. You don't need to initialize the column with MyData$SubjectExpression <- ""
. You should just delete that line.Making all those changes, we get this:
MyData%>%
mutate(SubjectExpression =
case_when(
stringr::str_detect(Tweet, "yo creo") ~ 1,
TRUE ~ 0
)
)
Upvotes: 0