Mark H
Mark H

Reputation: 23

Assign observations to a group based on another vector in the same R dataframe

I'm trying to assign areas to observations in a dataframe in R, based on grid square IDs. I have the following dataframe (df):

      year month  square
    1 2000     2      A1
    2 2000     2      B2
    3 2000     2      H5
    4 2000     2      J9
    5 2000     2      A2
    6 2000     3      N8
    7 2000     3      M9
    8 2000     3      C7

I'd like to add another column for "area", assigning each observation to "North", "East", "South" or "West" based on the grid square. I've tried the following for loops which didn't do anything,

    for(i in 1:length(df$square))  {
    for(j in 1:length(N)) {
    if(df$square[i]==N[j]){
    df$area[i]=="N"}
    }
    }

    for(i in 1:length(df$square))  {
    if(any(df$square==N)==T){
    df$area[i]=="North"}
    }

Where "N" is an object I created containing the squares located in the north, i.e.:

    N <- c("A1","A2","B2")

I did find the following related question, but I'm wondering if it's different when characters are involved: Assign a group number based on another column by group in R

Any help would be appreciated. Thanks

Upvotes: 0

Views: 260

Answers (3)

Bryan Goggin
Bryan Goggin

Reputation: 2489

In R it is usually best to avoid loops, and especially nested loops. For this case, I prefer sapply().

N <- c("A1","A2","B2")
#assume these are the other designations
S <- c("H5", "J9")
E <- c("N8","M9")
W <- c("C7")

mydat$area<- sapply(mydat$square, function (x){
  if (x %in% N)  return("North")
  if (x %in% S)  return("South")
  if (x %in% E)  return("East")
  if (x %in% W)  return("West")
  else NA
  }) 
mydat

year month square  area
2000     2     A1 North
2000     2     B2 North
2000     2     H5 South
2000     2     J9 South
2000     2     A2 North
2000     3     N8  East
2000     3     M9  East
2000     3     C7  West

When you start having large data sets, *apply() functions will be much faster than loops in R.

Upvotes: 0

r2evans
r2evans

Reputation: 161085

Instead of defining vectors like N, I recommend a second data.frame pairing squares with areas:

df <- data.frame(year = 2000,
                 month = c(2,2,2,2,2,3,3,3),
                 square = c("A1", "B2", "H5", "J9", "A2", "N8", "M9", "C7"),
                 stringsAsFactors = FALSE)
areas <- data.frame(square = c("A1", "A2", "B1", "H5", "J9", "M9", "N8"),
                    area = c("N", "N", "N", "W", "E", "S", "S"),
                    stringsAsFactors = FALSE)

With that, just do a merge:

merge(df, areas, by = "square", all.x = TRUE)
#   square year month area
# 1     A1 2000     2    N
# 2     A2 2000     2    N
# 3     B2 2000     2 <NA>
# 4     C7 2000     3 <NA>
# 5     H5 2000     2    W
# 6     J9 2000     2    E
# 7     M9 2000     3    S
# 8     N8 2000     3    S

(The NAs are because of in incomplete areas definition.)

Upvotes: 1

Nate
Nate

Reputation: 10671

d <- data.frame(year = rep(2000, 8), month = rep(3,8),
            square = c("A1", "B2", "H5", "J9", "A2", "N8", "M9", "C7"))

N <- c("A1","A2","B2")

for(i in 1:nrow(d))  {
    if (d$square[i] %in% N) {
        d$area[i] <- "North"
    }
    else (
        d$area[i] <- "Somewhere Else"
    )
}

layer in else if() statements in the for loop for other cardinal direction id vectors

Upvotes: 0

Related Questions