Caitlin
Caitlin

Reputation: 95

Optimize Nested For Loops in R

For this problem I am trying to examine a column within the dataset data called firstdigits (it is the 22nd column), determine how many times each value occurs, and put that into a new column called count (the 27th column). So say that a 1 occurs in data$firstdigits a total of 5 times, everywhere where data$firstdigits=1, I want data$count=5 in that row.

The method that I've come up with might work but its so clunky that it hasn't finished running yet for me to know. I'm looking for a faster way to achieve this.

unique = as.data.frame(unique(data$firstdigits))
count = as.data.frame(0)
for (i in 1:nrow(unique)){
  count[i,1] = sum(data$firstdigits == unique[i,1])
}

data$count = 0
for(j in 1:nrow(data)){
  for(k in 1:nrow(unique)){
    if (data[j,22] == unique[k,1]){
      data[j,27] == count[k,1]
    }
  }
}

Upvotes: 2

Views: 211

Answers (3)

CPak
CPak

Reputation: 13581

As the commenter suggested, you can use table and dplyr. I'll make up a data frame

df <- data.frame( firstdigits <- round(runif(100)*10) )
df

  firstdigits
1           1
2           7
3           1
4           2
5           1
6           0

Use table to count unique values

tbl.df <- table( df$firstdigits )
tbl.df

0  1  2  3  4  5  6  7  8  9 10 
9 10 11  9 15  7  7 12  6  7  7 

Then use dplyr::mutate to bind the counts as a new column

df <- df %>% 
      mutate( count = tbl.df[as.character(firstdigits)] )

Note I'm using the character value to index tbl.df. tbl.df[0] is not a valid index, whereas tbl.df["0"] will give 9.

Upvotes: 0

Ajay Ohri
Ajay Ohri

Reputation: 3492

How about sqldf

library(sqldf) 

> df=NULL
> df$col26=sample(100,100,T)
> df=as.data.frame(df)

> df2=sqldf("select count(col26) as col27,col26 from df group by col26")
> df2=as.data.frame(df2)

> str(df)
'data.frame':   100 obs. of  1 variable:
 $ col26: int  21 49 99 100 46 72 32 84 44 100 ...

> str(df2)
'data.frame':   57 obs. of  2 variables:
 $ col27       : int  1 1 1 3 1 1 1 1 4 1 ...
 $ col26       : int  6 9 10 11 12 14 16 17 21 22 ...

> df3=merge(df2,df,by="col26",all.y=TRUE)

> str(df3)
'data.frame':   100 obs. of  2 variables:
 $ col26       : int  6 9 10 11 11 11 12 14 16 17 ...
 $ col27       : int  1 1 1 3 3 3 1 1 1 1 ...

Upvotes: 0

Val
Val

Reputation: 7023

Maybe you could drop the nested loop entirely:

Using a loop you can go through all unique values in data$firstdigits and then assign the number of occurrences in data$count:

## create count column if necessary
# data$count <- 0

for (v in unique(data$firstdigits)){

# number of occurences x
x <- sum(data$firstdigits == v)

data$count[data$firstdigits == v] <- x

}

Upvotes: 4

Related Questions