Tavi
Tavi

Reputation: 2748

r return common rows for each value in a given column

assuming i have a dataframe that look like so:

    category  type
[1] A        green
[2] A        purple
[3] A        orange
[4] B        yellow
[5] B        green
[6] B        orange
[7] C        green

How do I get a list containing those types that appear in each category? In this case it should look like:

    type
[1] green

I know that this question is basic, and probably has been asked by someone else before; but my method is too long and I'm sure there's a more efficient way of doing it: I used to split the dataframe based on category, and do the set intersection. Is there a better way please? thanks!

Upvotes: 3

Views: 140

Answers (6)

nrussell
nrussell

Reputation: 18602

Here's one approach using data.table - provided that type only appears at most once per category:

library(data.table)
DT <- data.table(DF)
##
R> DT[
    ,list(
      nCat=.N
    ),by=type][
      nCat==length(unique(DT$category)),
      type]
[1] "green"

All this does it aggregate the original data as a count of rows by type (nCat), and then subset that result by taking the rows where nCat is equal to the unique number of categories in DT.

Edit: Thanks to @Arun, this can be done more concisely with a newer version of data.table by taking advantage of the uniqueN function:

unique(dt)[, .N, by=type][N == uniqueN(dt$category), type]

If you aren't guaranteed that type will appear at most once per category, you make make a slight modification to the above:

R> DT[
    ,list(
      nCat=length(unique(category))
    ),by=type][
      nCat==length(unique(DT$category)),
      type]
[1] "green" 

Data:

DF <- read.table(
  text="category  type
A        green
A        purple
A        orange
B        yellow
B        green
B        orange
C        green",
  header=TRUE,
  stringsAsFactors=F)

Upvotes: 3

DatamineR
DatamineR

Reputation: 9618

Assuming a type appears in a category at most once (otherwise change the == to >=) and using table you could try the following:

 colnames(table(df))[colSums(table(df)) == length(unique(df$category))]
[1] "green"

Upvotes: 4

Joe
Joe

Reputation: 3991

Assuming your data are in df:

df.sum <- aggregate(df$tpye, by = list(df$type), FUN = length)
types <- df.sum[which(df$sum == length(unique(df$x))),]

This will count the number of appearances in each type, and see which ones appear as many times as you have categories. If types don't appear more than once in a category, it will effectively do what you want, though it will not work if that assumption is violated.

Upvotes: 1

rawr
rawr

Reputation: 20811

One way would be to make a table and either select the types that appear the number of times that each category appears (3 in this case), or since you say it can only appear once, just take the mean and select the mean == 1 (or >= 1).

dat <- read.table(header = TRUE, text="category  type
A        green
A        purple
A        orange
B        yellow
B        green
B        orange
C        green")

tbl <- data.frame(with(dat, ftable(category, type)))
tbl[with(tbl, ave(Freq, type)) >= 1, ]

#   category  type Freq
# 1        A green    1
# 2        B green    1
# 3        C green    1

unique(tbl[with(tbl, ave(Freq, type)) >= 1, 'type'])
# [1] green

Upvotes: 2

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

If df is your data.frame, here is 'one' line of code thanks to Reduce:

x = df$category
y = df$type

Reduce(intersect, lapply(unique(x), function(u) y[x==u]))
#[1] "green"

Upvotes: 2

nico
nico

Reputation: 51640

I couldn't really find a super-obvious solution, however this does the job.

df <- data.frame(category=c("A", "A", "A", "B", "B", "B", "C"), 
                 type=c("green", "purple", "orange", "yellow", 
                        "green", "orange", "green"))

# Split the data frame by type
# This gives a list with elements corresponding to each type
types <- split(df, df$type)

# Find the length of each element of the list
len <- sapply(types, function(t){length(t$type)})

# If the length is equal to the number of categories then 
# the type is present in all categories 
res <- names(which(len==length(unique(df$category))))

Note that sapply will return the types as names of the vector, hence the call to names in the next statement.

Upvotes: 2

Related Questions