Reputation: 217
How can I remove the entire row for those which start with "RT" in the first column?
structure(list(text = structure(c(4L, 6L, 1L, 2L, 5L, 3L), .Label = c("@AirAsia @AirAsiaId finally they let us fly with 9.20 flight today. Manual boarding pass. Phew, that was a great relief!",
"@AirAsia your direct debit (Maybank) payment gateways is not working. Is it something you are working to fix?",
"RT @AirAsia: Kindly note that CIMB Direct Debit service will be unavailable tonight from (GMT+8) 1145hrs on 31 Jan until 0600hrs on 3 Feb 2…",
"RT @AirAsia: Skipped breakfast this morning? Now you can enjoy a great breakfast onboard with our new breakfast meals! http://t.co/957ZaLjY…",
"xdek ke flight @AirAsia Malaysia to LA... hahah..bagi la promo murah2 sikit, kompom aku beli...",
"You know there is a problem when customer service asks you to wait for 103 minutes and your no is 42 in the queue. @AirAsia"
), class = "factor"), created = structure(c(5L, 4L, 4L, 3L, 2L,
1L), .Label = c("1/2/2014 16:14", "1/2/2014 17:00", "3/2/2014 0:54",
"3/2/2014 0:58", "3/2/2014 1:28"), class = "factor")), .Names = c("text",
"created"), class = "data.frame", row.names = c(NA, -6L))
Upvotes: 2
Views: 560
Reputation: 9687
All of the above work, I prefer subset because it is a bit more legible:
no.rts <- subset( tweets, ! grepl("^RT ", text) )
Upvotes: 0
Reputation: 99341
grepl
also works. Assuming d
is the data set,
> d[!grepl("^RT", d$text), ]
## text created
## 2 You know there...@AirAsia 3/2/2014 0:58
## 3 @AirAsia... great relief! 3/2/2014 0:58
## 4 @AirAsia...orking to fix? 3/2/2014 0:54
## 5 xdek ke flight ... 1/2/2014 17:00
Upvotes: 4
Reputation: 16080
Or use stri_sub
function from stringi
package to get first two characters and then check if they are equal to "RT":
require(stringi)
df[stri_sub(df$text,1,2)!="RT",]
Upvotes: 0
Reputation: 78812
Assuming your data frame is called tweets
, then
no.rts <- tweets[grep("^RT ", tweets$text, invert=TRUE),]
will do what you want (and put the results in a new data frame called no.rts
).
The grep
statement says to ignore all lines in tweets$text
that begin (^
) with RT
. Without the invert=TRUE
it would select all the lines beginning with RT
.
Upvotes: 4