erikfjonsson
erikfjonsson

Reputation: 197

use a for loop to create a column containing the count of value in other column

First, I'm new to Stackoverflow and new to R. So please bear with me on potential misunderstandings and the like on my side.

I have a dataframe with several columns. I am trying to create a new column which would contain the count of the value in one of the other columns. The task specifies that I need to use a for loop to achive this even though it might not be the most effective or effecient method.

I have tried with this approach but for some reason it does not work.

for (i in nrow(df)) {
   df$new_col[i] <- sum(df$old_col == df$old_col[i], na.rm = TRUE)
}

If you have data like this:

old_col   name
   1       a
   1       b
   2       c
   3       d

The code should yield:

old_col   name   new_col
   1       a        2
   1       b        2
   2       c        1
   3       d        1

I am grateful for any help!

Upvotes: 2

Views: 2305

Answers (4)

Ian
Ian

Reputation: 451

Just missing the 1: in the for(i in 1:nrow(df)) line.

df <- 
tribble(
  ~old_col,   ~name,
  1,         "a",
  1,         "b",
  2,         "c",
  3,         "d")

df$new_col <- NA

for (i in 1:nrow(df)) {
  df$new_col[i] <- sum(df$old_col == df$old_col[i], na.rm = TRUE)
}

#         old_col name  new_col
#          <dbl> <chr>   <int>
#   1       1      a      2
#   2       1      b      2
#   3       2      c      1
#   4       3      d      1

Upvotes: 0

Jim Chen
Jim Chen

Reputation: 3749

For your own code, just change nrow(df) to 1:nrow(df), and it should work perfectly:

for (i in 1:nrow(df)) {
  df$new_col[i] <- sum(df$old_col == df$old_col[i], na.rm = TRUE)
}

Another approach:

new_col=sapply(df$old_col,function(x) sum(df$old_col == x, na.rm = TRUE) )
df<-cbind(df,new_col)

Upvotes: 0

MichaelChirico
MichaelChirico

Reputation: 34763

What you're after is "count by groups" -- group by old_col and count the number of rows with that value of old_col.

This is a very common operation and data manipulation packages make it easy to do this. My personal choice of data package is data.table, where your operation can be expressed as:

library(data.table)
setDT(df) # convert to data.table to 'unlock' the correct syntax
df[ , new_col := .N, by = old_col]

With your data:

df = data.frame(old_col = c(1, 1:3), name = letters[1:4])

output:

   old_col name new_col
1:       1    a       2
2:       1    b       2
3:       2    c       1
4:       3    d       1

If forced to do this with a for loop, I strongly recommend not using 1:nrow(df). Probably the most effective way is to use a table:

counts = as.data.frame(table(old_col = df$old_col))

for (ii in 1:nrow(counts)) {
  df$new_col[df$old_col == counts$old_col[ii]] = counts$Freq[ii]
}

This avoids repetitively counting the number of rows -- imagine in old_col you had 1,000,000 repetitions of 1. You wouldn't want to coun't up to 1,000,000 one million times (once for each appearance of 1); better to count 1,000,000 once only.

Upvotes: 0

Saurabh Chauhan
Saurabh Chauhan

Reputation: 3221

You can try this (Solution for very begineer):

for(i in 1:nrow(df)){
  if(i==1){
   df$new_col[i]=1 # For first point
  }
  else if(df$old_col[i]==df$old_col[i-1]){
   df$new_col[i]=df$new_col[i-1]+1 # If old_col values are same
  }
  else{
   df$new_col[i]=1  # When we have a new old_col value
  }
} 

Output:

    old_col name new_col
1       1    a       1
2       1    b       2
3       2    c       1
4       3    d       1

Upvotes: 1

Related Questions