Reputation: 2890
I have a data frame with multiple columns and I want to be able to isolate two of the columns and get the total amount of unique values... here's an example of what I mean:
Lets say i have a data frame df:
df<- data.frame(v1 = c(1, 2, 3, 2, "a"), v2 = c("a", 2 ,"b","b", 4))
df
v1 v2
1 1 a
2 2 2
3 3 b
4 2 b
5 a 4
Now what Im trying to do is extract just the unique values over the two columns. So if i just used unique() for each column the out put would look like this:
> unique(df[,1])
[1] 1 2 3 a
> unique(df[,2])
[1] a 2 b 4
But this is no good as it only finds the unique values per column, whereas I need the total amount of unique values over the two columns! For instance, 'a' is repeated in both columns, but I only want it counted once. For an example output of what I need; imagine the columns V1 and V2 are placed on top of each other like so:
V1_V2
1 1
2 2
3 3
4 2
5 a
6 a
7 2
8 b
9 b
10 4
The unique values of V1_V2 would be:
V1_V2
1 1
2 2
3 3
5 a
8 b
10 4
Then I could just count the rows using nrow(). Any ideas how I'd achieve this?
Upvotes: 18
Views: 61509
Reputation: 1011
With this approach, you can obtain the unique values does not matter how many columns you have:
df2 <- as.vector(as.matrix(df))
unique(df2)
And then, just use length
.
Upvotes: 15
Reputation: 908
A generic approach:
uq_elem=c()
for(i in 1:ncol(df))
{
uq_elem=c(unique(df[,i]), uq_elem)
uq_elem=unique(uq_elem)
}
All the different elements will be at: uq_elem
Upvotes: 0
Reputation: 12937
This is well suited for union
:
data.frame(V1_V2=union(df$v1, df$v2))
# V1_V2
#1 1
#2 2
#3 3
#4 a
#5 b
#6 4
Upvotes: 14