Column manipulation in R - data stored as column name

Question

I have a data frame of strangely formatted data, where information is stored as part of the column name.

library(tidyverse)

Ihave <- frame_data(
  ~ID,~group,~AAA_info2_BBB,~CCC_info3_DDD,
  "first",  1, as.Date("1970-01-01"), as.Date("1970-01-02"),
  "second", 2, as.Date("1971-01-01"), as.Date("1971-01-02"),
  "third",  3, as.Date("1972-01-01"), as.Date("1972-01-02"),
)

# A tibble: 3 x 4
  ID     group AAA_info2_BBB CCC_info3_DDD
                    
1 first      1 1970-01-01    1970-01-02   
2 second     2 1971-01-01    1971-01-02   
3 third      3 1972-01-01    1972-01-02

I would need to get the information back in the data frame, as shown below

Iwant <-  frame_data(
  ~ID,~group,~source,~variable,~value,~period,
  "first",  1, "AAA", "info1", as.Date("1970-01-01"), "BBB",
  "second", 2, "AAA", "info1", as.Date("1971-01-01"), "BBB",
  "third",  3, "AAA", "info1", as.Date("1972-01-01"), "BBB",
  "first",  1, "CCC", "info2", as.Date("1970-01-02"), "DDD",
  "second", 2, "CCC", "info2", as.Date("1971-01-02"), "DDD",
  "third",  3, "CCC", "info2", as.Date("1972-01-02"), "DDD",
)

# A tibble: 6 x 6
  ID     group source variable value      period
                 
1 first      1 AAA    info1    1970-01-01 BBB   
2 second     2 AAA    info1    1971-01-01 BBB   
3 third      3 AAA    info1    1972-01-01 BBB   
4 first      1 CCC    info2    1970-01-02 DDD   
5 second     2 CCC    info2    1971-01-02 DDD   
6 third      3 CCC    info2    1972-01-02 DDD

I though it could work by writing a function that handles one of the "AAA_info2_BBB"-type columns at a time, and it seems to work using the following function

my_fun <- function(df, one_var) {

  # Get string from called column name
  one_var_char <- 
    enquo(one_var) %>%  
    { as.character(.)[2] }  

  # Split string across "_" and return character vector
  one_var_char_splitted <- 
    one_var_char %>% 
    { strsplit(., "_")[[1]] }

  new_one_var <- one_var_char_splitted[2]

  names(df)[names(df) == one_var_char] <- new_one_var

  df %>%
    select(new_one_var) %>% 
    data.frame(source = one_var_char_splitted[1],
               period = one_var_char_splitted[3] )
}

Which returns (as expected)

Ihave %>% 
  select(ID, group, AAA_info2_BBB) %>% 
  my_fun(AAA_info2_BBB)

       info2 source period
1 1970-01-01    AAA    BBB
2 1971-01-01    AAA    BBB
3 1972-01-01    AAA    BBB

But I cannot manage to "map" this function onto the Ihave data frame to produce the desired Iwant. I have tried several mixtures of purrr::map, without success. Is there a flaw in my approach? Have I missed something?

Any help much appreciated!

IceCreamToucan · Accepted Answer

Same idea as gather then separate, but just for the sake of variety here's a data.table method using melt and tstrsplit

library(data.table)
setDT(Ihave)

melt(Ihave, c('ID', 'group'))[, 
  c('source', 'variable', 'period') := tstrsplit(variable, '_')]

#        ID group variable      value source period
# 1:  first     1    info2 1970-01-01    AAA    BBB
# 2: second     2    info2 1971-01-01    AAA    BBB
# 3:  third     3    info2 1972-01-01    AAA    BBB
# 4:  first     1    info3 1970-01-02    CCC    DDD
# 5: second     2    info3 1971-01-02    CCC    DDD
# 6:  third     3    info3 1972-01-02    CCC    DDD

Column manipulation in R - data stored as column name

Answers (2)

Related Questions