Reputation: 2748
assuming i have a dataframe that look like so:
category type
[1] A green
[2] A purple
[3] A orange
[4] B yellow
[5] B green
[6] B orange
[7] C green
How do I get a list containing those types that appear in each category? In this case it should look like:
type
[1] green
I know that this question is basic, and probably has been asked by someone else before; but my method is too long and I'm sure there's a more efficient way of doing it: I used to split the dataframe based on category, and do the set intersection. Is there a better way please? thanks!
Upvotes: 3
Views: 140
Reputation: 18602
Here's one approach using data.table
- provided that type
only appears at most once per category:
library(data.table)
DT <- data.table(DF)
##
R> DT[
,list(
nCat=.N
),by=type][
nCat==length(unique(DT$category)),
type]
[1] "green"
All this does it aggregate the original data as a count of rows by type (nCat
), and then subset that result by taking the rows where nCat
is equal to the unique number of categories in DT
.
Edit:
Thanks to @Arun, this can be done more concisely with a newer version of data.table
by taking advantage of the uniqueN
function:
unique(dt)[, .N, by=type][N == uniqueN(dt$category), type]
If you aren't guaranteed that type
will appear at most once per category, you make make a slight modification to the above:
R> DT[
,list(
nCat=length(unique(category))
),by=type][
nCat==length(unique(DT$category)),
type]
[1] "green"
Data:
DF <- read.table(
text="category type
A green
A purple
A orange
B yellow
B green
B orange
C green",
header=TRUE,
stringsAsFactors=F)
Upvotes: 3
Reputation: 9618
Assuming a type
appears in a category
at most once (otherwise change the ==
to >=
) and using table
you could try the following:
colnames(table(df))[colSums(table(df)) == length(unique(df$category))]
[1] "green"
Upvotes: 4
Reputation: 3991
Assuming your data are in df
:
df.sum <- aggregate(df$tpye, by = list(df$type), FUN = length)
types <- df.sum[which(df$sum == length(unique(df$x))),]
This will count the number of appearances in each type, and see which ones appear as many times as you have categories. If types don't appear more than once in a category, it will effectively do what you want, though it will not work if that assumption is violated.
Upvotes: 1
Reputation: 20811
One way would be to make a table and either select the types that appear the number of times that each category appears (3 in this case), or since you say it can only appear once, just take the mean and select the mean == 1 (or >= 1).
dat <- read.table(header = TRUE, text="category type
A green
A purple
A orange
B yellow
B green
B orange
C green")
tbl <- data.frame(with(dat, ftable(category, type)))
tbl[with(tbl, ave(Freq, type)) >= 1, ]
# category type Freq
# 1 A green 1
# 2 B green 1
# 3 C green 1
unique(tbl[with(tbl, ave(Freq, type)) >= 1, 'type'])
# [1] green
Upvotes: 2
Reputation: 31171
If df
is your data.frame
, here is 'one' line of code thanks to Reduce
:
x = df$category
y = df$type
Reduce(intersect, lapply(unique(x), function(u) y[x==u]))
#[1] "green"
Upvotes: 2
Reputation: 51640
I couldn't really find a super-obvious solution, however this does the job.
df <- data.frame(category=c("A", "A", "A", "B", "B", "B", "C"),
type=c("green", "purple", "orange", "yellow",
"green", "orange", "green"))
# Split the data frame by type
# This gives a list with elements corresponding to each type
types <- split(df, df$type)
# Find the length of each element of the list
len <- sapply(types, function(t){length(t$type)})
# If the length is equal to the number of categories then
# the type is present in all categories
res <- names(which(len==length(unique(df$category))))
Note that sapply
will return the types as names of the vector, hence the call to names
in the next statement.
Upvotes: 2