statistics_learning
statistics_learning

Reputation: 437

How to subset a dataset based on another dataset?

I want to subset the dataframe z that meets the following requirements:

1,remove the observations from dataframe z that the values of column 1 is not in the row-name of matrix y;

2,remove the observations from dataframe z that the values of column 2 is not in the column-name of matrix y.

The following is the code creating dataframe z and matrix y:

    ## a matrix y 

y<-matrix(1:12, nrow=3, ncol=4, dimnames = list(c("R1", "R2","R3"),c("C1", "C2","C3","C4") ))


## a dataframe z
col1<-c("R1", "R2","R3","R4","R5","R6","R7","R8","R9")
col2<-c("C1", "C2","C3","C4","C5","C6","C7","C8","C9")


z<- data.frame(col1,col2)

The form of datafram z is : data frame z

The form of matrix y is : matrix y

The ouput dataframe I want should be:

          col1     col2

1          R1       C1

2          R2       C2

3          R3       C3

I have no idea on how to subset the dataframe based on the row and column names of a matrix. Could anyone know how to do that? Thanks in advance.

Upvotes: 1

Views: 1217

Answers (1)

floattube
floattube

Reputation: 26

Try the following:

# Specification of z and y:

y<-matrix(1:12, nrow=3, ncol=4, dimnames = list(c("R1", "R2","R3"),c("C1", "C2","C3","C4") ))

col1<-c("R1", "R2","R3","R4","R5","R6","R7","R8","R9")
col2<-c("C1", "C2","C3","C4","C5","C6","C7","C8","C9")

z<- data.frame(col1,col2)

# extract subset:

col.in.y.row = ( z[,1] %in% row.names(y) ) 

col.in.y.col = ( z[,2] %in% colnames(y) )

data.frame(z, col.in.y.row, col.in.y.col)

(z.subsetted = z[ col.in.y.row & col.in.y.col, ])

Upvotes: 1

Related Questions