Reputation: 677
I am trying to replicate a very simple VBA code on R, to identify duplicates. My goals is to identify how many sets of duplicates in a sample set, first by reading down across rows, then reading across column. So I came up with this piece of sample:
x<-matrix(data=c("Ali","Ali","Abu","Ali","Ahmad","siti","Ali","Abu", "Ahmad", "Ali", "Abu", "Aisyah", "Khalid", "Robin","Ahmad","Ali","JOrdan","siti"), nrow=6)
x<-data.frame(x)
colnames(x)<-c("nama1","nama2","nama3")
so you would get something like this
nama1 nama2 nama3
1 Ali Ali Khalid
2 Ali Abu Robin
3 Abu Ahmad Ahmad
4 Ali Ali Ali
5 Ahmad Abu JOrdan
6 siti Aisyah siti
So what I want to do is essentially:
c<-0
for (i in x){
if (x[i,1]==x[i+1,1]){
c=c+1
}
print c
}
The final output I want is to specify how many duplicates are there in each row, and subsequently do it across the column... like
for i=1 to 10
for j=1 to 20
cells(i,j)="XXX"
do this
next j
next i
problem is I don't know how to specify individual cell in R like eg. in VBA you can do cells(i+1,1)=cells(i,1).. and I am learning to do very simple data manipulation in R.
I would like to sum up the c value for all the columns at the end. So it would be 4+4+6= 14.
Any advice is welcome! Thanks
Upvotes: 0
Views: 64
Reputation: 27792
#row total number of duplicates
apply(x, 1, function(x) length( x[ duplicated(x) ] ) )
#[1] 1 0 1 2 0 1
#column total number of duplicates
apply(x, 2, function(x) length( x[ duplicated(x) ] ) )
#nama1 nama2 nama3
# 2 2 0
Upvotes: 1