Reputation: 71
I have a large data-set consisting of a header and a series of values in that column. I want to detect the presence and number of duplicates of these values within the whole dataset.
1 2 3 4 5 6 7
734 456 346 545 874 734 455
734 783 482 545 456 948 483
So for example, it would detect 734 3 times, 456 twice etc.
I've tried using the duplicated function in r but this seems to only work on rows as a whole or columns as a whole. Using
duplicated(df)
doesn't pick up any duplicates, though I know there are two duplicates in the first row.
So I'm asking how to detect duplicates both within and between columns/rows.
Cheers
Upvotes: 0
Views: 57
Reputation: 2829
You can transform it to a vector and then use table()
as follows:
library(data.table)
library(dplyr)
df<-fread("734 456 346 545 874 734 455
734 783 482 545 456 948 483")
df%>%unlist()%>%table()
# 346 455 456 482 483 545 734 783 874 948
# 1 1 2 1 1 2 3 1 1 1
Upvotes: 1
Reputation: 101044
You can use table()
and data.frame()
to see the occurrence
data.frame(table(v))
such that
v Freq
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 346 1
9 455 1
10 456 2
11 482 1
12 483 1
13 545 2
14 734 3
15 783 1
16 874 1
17 948 1
DATA
v <- c(1, 2, 3, 4, 5, 6, 7, 734, 456, 346, 545, 874, 734, 455, 734,
783, 482, 545, 456, 948, 483)
Upvotes: 2