Reputation: 1630
I call apply
to apply a function over a each row of my dataframe, however I am getting some strange results. When I first run apply
(run #1), only a subset of the rows produce the expected result. After running apply
a second time (run #2), some of the value that were initially incorrect are correct. It is consistent in which rows are incorrect after run #1.
assign_id()
looks for the ID located in first
with the nine other columns within the dataframe, returning an integer corresponding to the column that matches.
assign_id <- function(row) {
if(is.na(row['first'])) {
return(NULL)
}
else if(row['first'] %in% c('none')) {
return(0)
}
else if(row['first'] %in% as.character(row['one'])){
return(1)
}
else if(row['first'] %in% as.character(row['two'])){
return(2)
}
else if(row['first'] %in% as.character(row['three'])){
return(3)
}
else if(row['first'] %in% as.character(row['four'])){
return(4)
}
else if(row['first'] %in% as.character(row['five'])){
return(5)
}
else if(row['first'] %in% as.character(row['six'])){
return(6)
}
else if(row['first'] %in% as.character(row['seven'])){
return(7)
}
else if(row['first'] %in% as.character(row['eight'])){
return(8)
}
else if(row['first'] %in% as.character(row['nine'])){
return(9)
} else {
return(11)
}
}
df <- read.csv('df.csv')
# Run #1
df$id <- apply(df, 1, assign_id)
# All 'id' fields return 11
df[df$first %in% 55627, c('id', 'first', 'six')]
> head(df[df$first %in% 55627, c('id', 'first', 'six')])
id first six
414 11 55627 55627
529 11 55627 118950
791 11 55627 55627
1570 11 55627 118950
1832 11 55627 118950
2116 11 55627 118950
# Run #2
df$id <- apply(df, 1, assign_id)
# All 'id' fields return the correct integer
df[df$first %in% 55627, c('id', 'first', 'six')]
> head(df[df$first %in% 55627, c('id', 'first', 'six')])
id first six
414 6 55627 55627
529 5 55627 118950
791 6 55627 55627
1570 8 55627 118950
1832 5 55627 118950
2116 5 55627 118950
Data is located here
Upvotes: 0
Views: 277
Reputation: 7592
With a little help from friends, I came up with this simpler, base-R solution:
df$id<-unlist(apply(df,1,function(x)
ifelse(x["first"]=="none",0, which(as.integer(x["first"])==as.integer(x[2:10])))))
See the answers there for an explanation of why apply
was problematic -- briefly, it transformed all your data to character, but then padded it in a way that made the comparisons fail.
On a related note, when you read.csv
, you might want to add stringsAsFactors=FALSE
to avoid making the first
column a factor.
Upvotes: 1