Chetan Arvind Patil
Chetan Arvind Patil

Reputation: 866

Subset Data Based on Column Attributes

I want to subset data table based on specific value of column attribute value. In below example it will be 2200.

Following code won't help for big data. I want to follow an optimized approach which applies to larger data set.

> identical(attr(data[,1], "metadata")$DP.SomeNumber, "2200")
[1] FALSE
> identical(attr(data[,1], "metadata")$DP.SomeNumber, "2200")
[1] FALSE
> identical(attr(data[,2], "metadata")$DP.SomeNumber, "2200")
[1] TRUE
> identical(attr(data[,3], "metadata")$DP.SomeNumber, "2200")
[1] TRUE
> identical(attr(data[,4], "metadata")$DP.SomeNumber, "2200")
[1] TRUE
> identical(attr(data[,5], "metadata")$DP.SomeNumber, "2200")
[1] FALSE
> identical(attr(data[,6], "metadata")$DP.SomeNumber, "2200")
[1] FALSE

Also, attr() doesn't accept all the columns at once. Any suggestions on how creating subset based on attribute values can be done recursively and efficiently?


Reproducible Data

column1 <- rep(-0.01, 8)
attr(column1, "metadata")$DP.SomeNumber <- "1200"
column2 <- rep(0.05, 8)
attr(column2, "metadata")$DP.SomeNumber <- "2200"
column3 <- rep(-0.01, 8)
attr(column3, "metadata")$DP.SomeNumber <- "2200"
column4 <- rep(0.05, 8)
attr(column4, "metadata")$DP.SomeNumber <- "2200"
column5 <- rep(-0.01, 8)
attr(column5, "metadata")$DP.SomeNumber <- "5200"
column6 <- rep(0.05, 8)
attr(column6, "metadata")$DP.SomeNumber <- "6200"

data <- data.frame(column1, column2, column3, column4, column5, column6)

Output of Above Data

> attr(data$column1, "metadata")$DP.SomeNumber
[1] "1200"
> attr(data$column2, "metadata")$DP.SomeNumber
[1] "2200"
> attr(data$column3, "metadata")$DP.SomeNumber
[1] "2200"
> attr(data$column4, "metadata")$DP.SomeNumber
[1] "2200"
> attr(data$column5, "metadata")$DP.SomeNumber
[1] "5200"
> attr(data$column6, "metadata")$DP.SomeNumber
[1] "6200"

Upvotes: 0

Views: 580

Answers (1)

Troy
Troy

Reputation: 131

Using sapply should do it. Based on your reproducible dataframe, the following code will yield your desired result:

data[, sapply(data, function(x) attr(x, "metadata")$DP.SomeNumber == "2200")]

Upvotes: 1

Related Questions