Reputation: 27
I´m quite new to R and I´m coming from a c++ background. I have a data frame with multiple rows and columns. My question is how can I do this in a different manner because it takes for ever to run. I have over 60 thousand rows and around 15 columns. Is there a better way to do this? Help is greatly appreciated!
counter <-0
for(j in 7:length(SeaStateData[3,]))
{
for( i in 1:length(SeaStateData[,3]))
{
if(!is.na(SeaStateData[i,j]) & !is.na(SeaStateData[i+1,j]))
if(SeaStateData[i,j] == SeaStateData[i+1,j])
{
counter <- counter + 1
}
}
}
Upvotes: 1
Views: 108
Reputation: 60858
I'd try this:
nr <- nrow(SeaStateData)
nc <- ncol(SeaStateData)
counter <- sum(SeaStateData[1:(nr - 1), 7:nc] ==
SeaStateData[2:nr, 7:nc],
na.rm = TRUE)
The subsets represent two submatrices, with a relative offset of one row. The ==
operator will yield a logical vector (in this case a matrix, which is just a vector with added dimension information) containing TRUE
if two items match, FALSE
if they differ, and NA
if one of them is NA
. The sum
over a logical vector counts all TRUE
values. The na.rm
attribute tells it to drop NA
values; otherwise the sum
would be NA
as well. sum(…, na.rm = TRUE)
is roughly the same as sum(na.omit(…))
.
Upvotes: 5