steppermotor
steppermotor

Reputation: 701

Categorizing in R

I'm using the following code to search a list of items, and classify them if they contain a word that I'm looking for.

 yo <- c("winter", "winter storm", "downline", "Hurricane", "c", "c")

 t = data.frame(yo)

 b <- grep("winter", t$yo)
 c <- grep("downline", t$yo)
 d <- grep("Hurricane",t$yo)

 t$y2[b] = "Winter" 
 t$y2[c] = "downline"
 t$y2[d] = "Hurricane"

Output:

        yo             y2
     winter          Winter
     winter storm    Winter
     downline        downline
     Hurricane       Hurricane
     c               Winter
     c               Winter

Any idea why it categorizes the last two as Winter even though the grep function wouldn't find anything matched to it?

Upvotes: 1

Views: 83

Answers (1)

user1470500
user1470500

Reputation: 652

When you type t$y2[b] = "Winter", the y2 column does not exist yet. But as all the columns of a data.frame must have the same size, the pattern of b c(1,2) is repeated three times in order to fill the column.

You can type this to see what happened:

t = data.frame(yo)
t$y2[b] = "Winter"
t

If the input is a vector containing 2 elements instead of "Winter", we can see that these two elements are repeated:

t = data.frame(yo)
t$y2[b] = c("Winter", "Not Winter")
t

If the number of rows of t is not a multiple of the index vector, R raises an error:

t = data.frame(yo)
t$y2[1:5] = "Winter"
t

A simple fix is to initialize y2 with a default value before using it:

t = data.frame(yo)

b <- grep("winter", t$yo)
c <- grep("downline", t$yo)
d <- grep("Hurricane",t$yo)

t$y2 = ""

t$y2[b] = "Winter" 
t$y2[c] = "downline"
t$y2[d] = "Hurricane"


        yo        y2
1       winter    Winter
2 winter storm    Winter
3     downline  downline
4    Hurricane Hurricane
5            c          
6            c   

Upvotes: 4

Related Questions