JCB
JCB

Reputation: 57

Rename a dataframe Column with text from within the column itself

Given a (simplified) dataframe with format

df <- data.frame(a = c(1,2,3,4), 
                 b = c(4,3,2,1), 
                 temp1 = c("-","-","-","foo: 3"), 
                 temp2 = c("-","bar: 10","-","bar: 4")
                 )

a  b   temp1    temp2
1  4     -        -
2  3     -     bar: 10
3  2     -        -
4  1  foo: 3   bar: 4

I need to rename all temp columns with the names contained within the column, My end goal is to end up with this:

a  b    foo      bar
1  4     -        -
2  3     -        10
3  2     -        -
4  1     3        4

the df column names and the data contained within them will be unknown, however the columns that need changing will contain temp and the delimiter will always be a ":"

As such I can easily remove the name from within the columns using dplyr like this:

df <- df %>% 
  mutate_at(vars(contains("temp")), ~(substr(., str_locate(., ":")+1,str_length(.))))

but first I need to rename the columns based on some function method, that scans the column and returns the value(s) within it, ie.

rename_at(vars(contains("temp")), ~(...some function.....)) 

As per the example given there's no guarantee that specific rows will have data so I can't simply grab value from row 1

Any ideas welcome. Thanks in advance

Upvotes: 3

Views: 152

Answers (2)

hello_friend
hello_friend

Reputation: 5788

This will do the job:

colnames(df)[which(grepl("temp", colnames(df)))] <- unique(unlist(sapply(df[,grepl("temp", colnames(df))],

                                                            function(x){gsub("[:].*",

                                                                             "",

                                                                             grep("\\w+",

                                                                                  x,

                                                                                  value = TRUE))})))

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 39858

One possibility involving dplyr and tidyr could be:

df %>%
 pivot_longer(names_to = "variables", values_to = "values", -c(a:b)) %>%
 mutate(values = replace(values, values == "-", NA_character_)) %>%
 separate(values, into = c("variables2", "values"), sep = ": ") %>%
 group_by(variables) %>%
 fill(variables2, .direction = "downup") %>%
 ungroup() %>%
 select(-variables) %>%
 pivot_wider(names_from = "variables2", values_from = "values")

      a     b foo   bar  
  <dbl> <dbl> <chr> <chr>
1     1     4 <NA>  <NA> 
2     2     3 <NA>  10   
3     3     2 <NA>  <NA> 
4     4     1 3     4   

If you want to further replace the NAs with -:

df %>%
 pivot_longer(names_to = "variables", values_to = "values", -c(a:b)) %>%
 mutate(values = replace(values, values == "-", NA_character_)) %>%
 separate(values, into = c("variables2", "values"), sep = ": ") %>%
 group_by(variables) %>%
 fill(variables2, .direction = "downup") %>%
 ungroup() %>%
 select(-variables) %>%
 pivot_wider(names_from = "variables2", values_from = "values") %>%
 mutate_at(vars(-a, -b), ~ replace_na(., "-"))

      a     b foo   bar  
  <dbl> <dbl> <chr> <chr>
1     1     4 -     -    
2     2     3 -     10   
3     3     2 -     -    
4     4     1 3     4   

Upvotes: 2

Related Questions