Reputation: 1151
I would like to keep data frame attributes after some joins. It seems that dplyr
functions holds attributes from columns but not the ones from the data frame itself.
See the example below:
library("dplyr")
library("lubridate")
#Fake data
n = 20
df <- data.frame("user_id" = 1:n,
"type" = sample(c(1,2), n, replace = T),
"amount" = 1000*rnorm(n))
#Suppose I want to add some metadata
attr(df, "query_timestamp") <- lubridate::now()
attr(df$amount, "currency") <- "BRL"
#encoding table for user type
encode <- data.frame("type" = c(1,2),
"description" = c("vendor", "shopper"))
print(attr(df, "query_timestamp"))
[1] "2018-07-18 15:30:57 -03"
print(attr(df$amount, "currency"))
[1] "BRL"
df <- df %>% dplyr::left_join(encode, by = "type")
print(attr(df, "query_timestamp"))
NULL
print(attr(df$amount, "currency"))
[1] "BRL"
Is there any reason for that? I would like to keep attributes but avoid using aux variables to store them.
Upvotes: 0
Views: 1085
Reputation: 263381
You can "reattach" the attribute using the attr<-
function:
df <- df %>% dplyr::left_join(encode, by = "type") %>%
`attr<-`("query_timestamp", attr(df,"query_timestamp") )
> print(attr(df, "query_timestamp"))
[1] "2018-07-18 14:41:39 PDT"
Normally the call would be one of:
`attr(df, "query_timestamp") <- attr(df,"query_timestamp") )
# or equivalently
`attr<-`(df, "query_timestamp", attr(df,"query_timestamp") )
But as you probably know the first argument can be dropped if it is the object that is being processed. This way you reattach before the destructive assignment (<-
) is executed. So you need to either save the attribute before the join as a separate value and then reattach it in a separate step after the join, or do it this way (reassigning just before the destructive "back"-assignment.
Upvotes: 2