Reputation: 701
I'm using the following code to search a list of items, and classify them if they contain a word that I'm looking for.
yo <- c("winter", "winter storm", "downline", "Hurricane", "c", "c")
t = data.frame(yo)
b <- grep("winter", t$yo)
c <- grep("downline", t$yo)
d <- grep("Hurricane",t$yo)
t$y2[b] = "Winter"
t$y2[c] = "downline"
t$y2[d] = "Hurricane"
Output:
yo y2
winter Winter
winter storm Winter
downline downline
Hurricane Hurricane
c Winter
c Winter
Any idea why it categorizes the last two as Winter even though the grep function wouldn't find anything matched to it?
Upvotes: 1
Views: 83
Reputation: 652
When you type t$y2[b] = "Winter"
, the y2 column does not exist yet. But as all the columns of a data.frame must have the same size, the pattern of b c(1,2)
is repeated three times in order to fill the column.
You can type this to see what happened:
t = data.frame(yo)
t$y2[b] = "Winter"
t
If the input is a vector containing 2 elements instead of "Winter", we can see that these two elements are repeated:
t = data.frame(yo)
t$y2[b] = c("Winter", "Not Winter")
t
If the number of rows of t is not a multiple of the index vector, R raises an error:
t = data.frame(yo)
t$y2[1:5] = "Winter"
t
A simple fix is to initialize y2 with a default value before using it:
t = data.frame(yo)
b <- grep("winter", t$yo)
c <- grep("downline", t$yo)
d <- grep("Hurricane",t$yo)
t$y2 = ""
t$y2[b] = "Winter"
t$y2[c] = "downline"
t$y2[d] = "Hurricane"
yo y2
1 winter Winter
2 winter storm Winter
3 downline downline
4 Hurricane Hurricane
5 c
6 c
Upvotes: 4