Reputation: 197
First, I'm new to Stackoverflow and new to R. So please bear with me on potential misunderstandings and the like on my side.
I have a dataframe with several columns. I am trying to create a new column which would contain the count of the value in one of the other columns. The task specifies that I need to use a for loop to achive this even though it might not be the most effective or effecient method.
I have tried with this approach but for some reason it does not work.
for (i in nrow(df)) {
df$new_col[i] <- sum(df$old_col == df$old_col[i], na.rm = TRUE)
}
If you have data like this:
old_col name
1 a
1 b
2 c
3 d
The code should yield:
old_col name new_col
1 a 2
1 b 2
2 c 1
3 d 1
I am grateful for any help!
Upvotes: 2
Views: 2305
Reputation: 451
Just missing the 1:
in the for(i in 1:nrow(df))
line.
df <-
tribble(
~old_col, ~name,
1, "a",
1, "b",
2, "c",
3, "d")
df$new_col <- NA
for (i in 1:nrow(df)) {
df$new_col[i] <- sum(df$old_col == df$old_col[i], na.rm = TRUE)
}
# old_col name new_col
# <dbl> <chr> <int>
# 1 1 a 2
# 2 1 b 2
# 3 2 c 1
# 4 3 d 1
Upvotes: 0
Reputation: 3749
For your own code, just change nrow(df)
to 1:nrow(df)
, and it should work perfectly:
for (i in 1:nrow(df)) {
df$new_col[i] <- sum(df$old_col == df$old_col[i], na.rm = TRUE)
}
Another approach:
new_col=sapply(df$old_col,function(x) sum(df$old_col == x, na.rm = TRUE) )
df<-cbind(df,new_col)
Upvotes: 0
Reputation: 34763
What you're after is "count by groups" -- group by old_col
and count the number of rows with that value of old_col
.
This is a very common operation and data manipulation packages make it easy to do this. My personal choice of data package is data.table
, where your operation can be expressed as:
library(data.table)
setDT(df) # convert to data.table to 'unlock' the correct syntax
df[ , new_col := .N, by = old_col]
With your data:
df = data.frame(old_col = c(1, 1:3), name = letters[1:4])
output:
old_col name new_col
1: 1 a 2
2: 1 b 2
3: 2 c 1
4: 3 d 1
If forced to do this with a for
loop, I strongly recommend not using 1:nrow(df)
. Probably the most effective way is to use a table
:
counts = as.data.frame(table(old_col = df$old_col))
for (ii in 1:nrow(counts)) {
df$new_col[df$old_col == counts$old_col[ii]] = counts$Freq[ii]
}
This avoids repetitively counting the number of rows -- imagine in old_col
you had 1,000,000 repetitions of 1
. You wouldn't want to coun't up to 1,000,000
one million times (once for each appearance of 1
); better to count 1,000,000 once only.
Upvotes: 0
Reputation: 3221
You can try this (Solution for very begineer):
for(i in 1:nrow(df)){
if(i==1){
df$new_col[i]=1 # For first point
}
else if(df$old_col[i]==df$old_col[i-1]){
df$new_col[i]=df$new_col[i-1]+1 # If old_col values are same
}
else{
df$new_col[i]=1 # When we have a new old_col value
}
}
Output:
old_col name new_col
1 1 a 1
2 1 b 2
3 2 c 1
4 3 d 1
Upvotes: 1