Brian Huey
Brian Huey

Reputation: 1630

R apply result is inconsistent

I call apply to apply a function over a each row of my dataframe, however I am getting some strange results. When I first run apply (run #1), only a subset of the rows produce the expected result. After running apply a second time (run #2), some of the value that were initially incorrect are correct. It is consistent in which rows are incorrect after run #1.

assign_id() looks for the ID located in first with the nine other columns within the dataframe, returning an integer corresponding to the column that matches.

assign_id <- function(row) {
  if(is.na(row['first'])) {
    return(NULL)
  }
  else if(row['first'] %in% c('none')) {
    return(0)
  }
  else if(row['first'] %in% as.character(row['one'])){
    return(1)
  }
  else if(row['first'] %in% as.character(row['two'])){
    return(2)
  }
  else if(row['first'] %in% as.character(row['three'])){
    return(3)
  }
  else if(row['first'] %in% as.character(row['four'])){
    return(4)
  }
  else if(row['first'] %in% as.character(row['five'])){
    return(5)
  }
  else if(row['first'] %in% as.character(row['six'])){
    return(6)
  }
  else if(row['first'] %in% as.character(row['seven'])){
    return(7)
  }
  else if(row['first'] %in% as.character(row['eight'])){
    return(8)
  }
  else if(row['first'] %in% as.character(row['nine'])){
    return(9)
  } else {
    return(11)
  }
}

df <- read.csv('df.csv')

# Run #1
df$id <- apply(df, 1, assign_id)
# All 'id' fields return 11
df[df$first %in% 55627, c('id', 'first', 'six')] 

> head(df[df$first %in% 55627, c('id', 'first', 'six')])
     id first    six
414  11 55627  55627
529  11 55627 118950
791  11 55627  55627
1570 11 55627 118950
1832 11 55627 118950
2116 11 55627 118950

# Run #2
df$id <- apply(df, 1, assign_id)
# All 'id' fields return the correct integer
df[df$first %in% 55627, c('id', 'first', 'six')] 

> head(df[df$first %in% 55627, c('id', 'first', 'six')])
     id first    six
414   6 55627  55627
529   5 55627 118950
791   6 55627  55627
1570  8 55627 118950
1832  5 55627 118950
2116  5 55627 118950

Data is located here

Upvotes: 0

Views: 277

Answers (1)

iod
iod

Reputation: 7592

With a little help from friends, I came up with this simpler, base-R solution:

df$id<-unlist(apply(df,1,function(x)
  ifelse(x["first"]=="none",0, which(as.integer(x["first"])==as.integer(x[2:10])))))

See the answers there for an explanation of why apply was problematic -- briefly, it transformed all your data to character, but then padded it in a way that made the comparisons fail.

On a related note, when you read.csv, you might want to add stringsAsFactors=FALSE to avoid making the first column a factor.

Upvotes: 1

Related Questions