Reputation: 95
For this problem I am trying to examine a column within the dataset data called firstdigits (it is the 22nd column), determine how many times each value occurs, and put that into a new column called count (the 27th column). So say that a 1 occurs in data$firstdigits a total of 5 times, everywhere where data$firstdigits=1, I want data$count=5 in that row.
The method that I've come up with might work but its so clunky that it hasn't finished running yet for me to know. I'm looking for a faster way to achieve this.
unique = as.data.frame(unique(data$firstdigits))
count = as.data.frame(0)
for (i in 1:nrow(unique)){
count[i,1] = sum(data$firstdigits == unique[i,1])
}
data$count = 0
for(j in 1:nrow(data)){
for(k in 1:nrow(unique)){
if (data[j,22] == unique[k,1]){
data[j,27] == count[k,1]
}
}
}
Upvotes: 2
Views: 211
Reputation: 13581
As the commenter suggested, you can use table
and dplyr
. I'll make up a data frame
df <- data.frame( firstdigits <- round(runif(100)*10) )
df
firstdigits
1 1
2 7
3 1
4 2
5 1
6 0
Use table
to count unique values
tbl.df <- table( df$firstdigits )
tbl.df
0 1 2 3 4 5 6 7 8 9 10
9 10 11 9 15 7 7 12 6 7 7
Then use dplyr::mutate
to bind the counts as a new column
df <- df %>%
mutate( count = tbl.df[as.character(firstdigits)] )
Note I'm using the character value to index tbl.df
. tbl.df[0]
is not a valid index, whereas tbl.df["0"]
will give 9.
Upvotes: 0
Reputation: 3492
How about sqldf
library(sqldf)
> df=NULL
> df$col26=sample(100,100,T)
> df=as.data.frame(df)
> df2=sqldf("select count(col26) as col27,col26 from df group by col26")
> df2=as.data.frame(df2)
> str(df)
'data.frame': 100 obs. of 1 variable:
$ col26: int 21 49 99 100 46 72 32 84 44 100 ...
> str(df2)
'data.frame': 57 obs. of 2 variables:
$ col27 : int 1 1 1 3 1 1 1 1 4 1 ...
$ col26 : int 6 9 10 11 12 14 16 17 21 22 ...
> df3=merge(df2,df,by="col26",all.y=TRUE)
> str(df3)
'data.frame': 100 obs. of 2 variables:
$ col26 : int 6 9 10 11 11 11 12 14 16 17 ...
$ col27 : int 1 1 1 3 3 3 1 1 1 1 ...
Upvotes: 0
Reputation: 7023
Maybe you could drop the nested loop entirely:
Using a loop you can go through all unique values in data$firstdigits
and then assign the number of occurrences in data$count
:
## create count column if necessary
# data$count <- 0
for (v in unique(data$firstdigits)){
# number of occurences x
x <- sum(data$firstdigits == v)
data$count[data$firstdigits == v] <- x
}
Upvotes: 4