Victor Mayrink
Victor Mayrink

Reputation: 1151

R keep data frame attributes after join

I would like to keep data frame attributes after some joins. It seems that dplyr functions holds attributes from columns but not the ones from the data frame itself.

See the example below:

library("dplyr")
library("lubridate")


#Fake data
n = 20
df <- data.frame("user_id" = 1:n,
                 "type" = sample(c(1,2), n, replace = T),
                 "amount" = 1000*rnorm(n))

#Suppose I want to add some metadata
attr(df, "query_timestamp") <- lubridate::now()
attr(df$amount, "currency") <- "BRL"

#encoding table for user type
encode <- data.frame("type" = c(1,2), 
                     "description" = c("vendor", "shopper"))

print(attr(df, "query_timestamp"))

[1] "2018-07-18 15:30:57 -03"

print(attr(df$amount, "currency"))

[1] "BRL"

df <- df %>% dplyr::left_join(encode, by = "type")
print(attr(df, "query_timestamp"))

NULL

print(attr(df$amount, "currency"))

[1] "BRL"

Is there any reason for that? I would like to keep attributes but avoid using aux variables to store them.

Upvotes: 0

Views: 1085

Answers (1)

IRTFM
IRTFM

Reputation: 263381

You can "reattach" the attribute using the attr<- function:

df <- df %>% dplyr::left_join(encode, by = "type") %>% 
                         `attr<-`("query_timestamp", attr(df,"query_timestamp") )

> print(attr(df, "query_timestamp"))
[1] "2018-07-18 14:41:39 PDT"

Normally the call would be one of:

`attr(df, "query_timestamp") <-  attr(df,"query_timestamp") )
# or equivalently
`attr<-`(df, "query_timestamp", attr(df,"query_timestamp") )

But as you probably know the first argument can be dropped if it is the object that is being processed. This way you reattach before the destructive assignment (<-) is executed. So you need to either save the attribute before the join as a separate value and then reattach it in a separate step after the join, or do it this way (reassigning just before the destructive "back"-assignment.

Upvotes: 2

Related Questions