Reputation: 157
I am having an issue with subsetting my Spark DataFrame.
I have a DataFrame
called nfe
, which contains a column called ITEM_PRODUTO
that is formatted as a string. I would like to subset this DataFrame
based on whether the item column contains the word "AREIA". I can easily subset the data based on an exact phrase:
nfe.subset1 <- subset(nfe, nfe$ITEM_PRODUTO == "AREIA LAVADA FINA")
nfe.subset2 <- subset(nfe, nfe$ITEM_PRODUTO %in% "AREIA")
However, what I would like is a subset of all rows that contain the word "AREIA" in the ITEM_PRODUTO column. When I try to use grep, though, I receive an error message:
nfe.subset3 <- subset(nfe, grep("AREIA", nfe$ITEM_PRODUTO))
# Error in as.character.default(x) :
# no method for coercing this S4 class to a vector
I've tried multiple iterations of syntax, and tried grepl
as well, but nothing seems to work. It's probably a syntax error, but could anyone help me out?
Thanks!
Upvotes: 3
Views: 474
Reputation: 35229
Standard R functions cannot be applied to SparkDataFrame
. Use either like`:
where(nfe, like(nfe$ITEM_PRODUTO, "%AREIA%"))
or rlike
:
where(nfe, rlike(nfe$ITEM_PRODUTO, ".*AREIA.*"))
Upvotes: 2