GBen
GBen

Reputation: 25

Add tag/label/attr/attribute to dataframe columns (variables)

I find quite strange the confusion that has been made in the field of attributes/tags/labels or whatever it has been called to columns in dataframe.

The question is very easy: I have a dataframe (let's call it StackedDataStd) with cases (rows) and variables (columns). Each column have a name and you can access it through many of different ways (imho having 10 ways to make the same task make everything more confused...)

when I remove a column the corresponding name is removed from the attributes.

Now I have many other tags to identify the types of used variables...let's make the example of bw tag (an integer number, in my case 1,2,4,8 or 16). Each column has his own bw and what I want is to select, let's say, all the columns with bw==1.

At the moment I made what I called a selector (selectorBw): a vector of length dim(stackedDataStd)2 with the bw values of each column.

By using stackedDataStd[,selectorBw == 1] I select the ones with bw==1.

But, every time I remove a column I have to remember to remove the corresponding position in the selector (and having ten selectors it start being a mess).

stackedDataStd[,-4]
selectorBw[-4]

I tried to add attributes in several ways:

attr(stackedDataStd,'bw') <- selectorBw

adds the attribute but it's not linked to the column like colnames. If I remove one column (stackedDataStd$segN <- c()) the attribute is not removed. If I remove with stackedDataStd <- stackedDataStd[,-1] the attribute disappears... (magic).

Here it has been suggested to assign the attribute to each element of the list:

for (i in seq_along(stackedDataStd)) { attr(stackedDataStd[[i]], "bw") <- selectorBw[i] }

but it can't be used with the whole dataframe, since it's not a dataframe attribute (i should cycle for each variable for every selection).

I have tried other ways, but i don't want to bother you with my attempts....

Do you have any suggestion? Maybe with Hmisc package? I don't want to have non-standard dataframe, if possibile.

Upvotes: 0

Views: 4266

Answers (2)

GBen
GBen

Reputation: 25

As suggested by @Allan Cameron I wrote my own functions to manage the tags.

With a non negligible pain I did a small edit to his code to allow the explicit declaration of more tags and I added a field in the main dataframe with all the names of the tags. If you have further suggestions on how to improve it, please feel free to answer me.

NB: subsetting removes the attribute 'tags' in the main df.

# Subsetter
ss <- function(df, tag, val) { 
  if(!is.data.frame(df) | dim(df)[2] < 1 | is.null(attr(df[[1]], tag))) { 
    data.frame() 
  } else {
      df[sapply(df, function(x) attr(x, tag) == val)] 
  }
}

# Gets attributes as vector
get_col_attr <- function(df,tag = 'all') {
  if(tag!='all') {
    sapply(df, function(x) attr(x, tag))
  } else {
    as.data.frame(mapply(function(z) sapply(attr(df,'tags'), function(x) attr(z,x)), df, SIMPLIFY =F))
  }
}
  

# Sets attributes to columns from a single vector
set_col_attr <- function(df,tag, attrs)
{
  outdf <- as.data.frame(mapply(function(col, tagName, bw) {
    attr(col, tagName) <- bw
    col
  }, df, tag, attrs, SIMPLIFY = FALSE))
  attr(outdf,'tags') <- unique(c(attr(df,'tags'),tag))
  outdf
}

Upvotes: 1

Allan Cameron
Allan Cameron

Reputation: 174338

If you want to build extra functionality into data frames using attributes but without specifying a new S3 class, you have to define that functionality somewhere else. It's really pretty easy to do by adding an attribute setter, an attribute getter, and a little subsetting function:

# Subsetter
ss <- function(df, bw) df[sapply(df, function(x) attr(x, "bw") == bw)]

# Gets attributes as vector
get_col_attr <- function(df) sapply(df, attr, "bw")

# Sets attributes to columns from a single vector
set_col_attr <- function(df, attrs)
{
  as.data.frame(mapply(function(col, bw) {
                         attr(col, "bw") <- bw
                         col
                         }, df, attrs, SIMPLIFY = FALSE))
}

I think this works quite nicely. Suppose we create a little data frame with two numeric columns and two character columns, and we wish to give the columns the attributes c(1, 1, 2, 2). We can just do:

df  <- data.frame(A = 1:5, B = 6:10, C = LETTERS[1:5], D = letters[1:5])
df  <- set_col_attr(df, c(1, 1, 2, 2))

This still looks and behaves like a normal data frame:

df
#>   A  B C D
#> 1 1  6 A a
#> 2 2  7 B b
#> 3 3  8 C c
#> 4 4  9 D d
#> 5 5 10 E e

But we can see each column has a bw attribute:

get_col_attr(df)
#> A B C D 
#> 1 1 2 2

And we can use this attribute to subset very easily:

ss(df, bw = 2)
#>   C D
#> 1 A a
#> 2 B b
#> 3 C c
#> 4 D d
#> 5 E e

ss(df, bw = 1)
#>   A  B
#> 1 1  6
#> 2 2  7
#> 3 3  8
#> 4 4  9
#> 5 5 10

Crucially, if we subset the data frame, the attributes are also subsetted appropriately:

df2 <- df[, 2:3]

get_col_attr(df2)
#> B C 
#> 1 2

Created on 2020-07-03 by the reprex package (v0.3.0)

Upvotes: 3

Related Questions