Jo_
Jo_

Reputation: 69

Renaming multiple columns and gathering with dplyr in R

I am trying to find a convenient way to rename multiple columns using the tidyverse. Say I have a tibble

df <- tibble(a = 1, b = 2, tmp_2000 = 23, tmp_2001 = 22.1, tmp_2002 = 25, pre_2000, pre_2001, pre_2002)

# A tibble: 1 x 8
  a     b tmp_2000 tmp_2001 tmp_2002 pre_2000 pre_2001 pre_2002
<dbl> <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
  1     2       23     22.1       25      100      103      189

temp and pre stand for temperature and precipitations. I want to reorganize this table in a tidy form, i.e. with one column for temperature, one for precipitations, and each row is the corresponding value for the year.

Right now the only option I found was to do something like this

df <- df %>%
  select(-starts_with("pre"))

names(df)[3:5] <- substr(names(df)[3:5],5,8) 

df<-df %>%
  gather(`2000`:`2002`,key = "year",value="temp")  %>%
  mutate("year" = as.integer(year)) 

# A tibble: 3 x 4
  a     b  year  temp
<dbl> <dbl> <int> <dbl>
  1     2  2000  23  
  1     2  2001  22.1
  1     2  2002  25 

This is not fantastic as I need to do the same thing with precipitations and then joins the two tables. In the future I will be getting more weather variables and this process will quickly become a pain.

Does anyone have an idea on how to do this more efficiently using the tidyverse?

Thanks,

Jo

PS: the only similar posts I saw referred to recoding the variables (using mutate_at), or renaming columns using the names that I showed above.

Upvotes: 3

Views: 868

Answers (3)

Nettle
Nettle

Reputation: 3321

df <- tibble(
  a = 1,
  b = 2,
  tmp_2000 = 23,
  tmp_2001 = 22.1,
  tmp_2002 = 25,
  pre_2000=100,
  pre_2001=103,
  pre_2002=189
)


df %>% 
  gather(key, value, -a:-b) %>% 
  separate(key, c("type", "year")) %>% 
  spread(type, value= value )

#> # A tibble: 3 x 5
#>       a     b year    pre   tmp
#>   <dbl> <dbl> <chr> <dbl> <dbl>
#> 1     1     2 2000    100  23  
#> 2     1     2 2001    103  22.1
#> 3     1     2 2002    189  25

```

Upvotes: 0

Onyambu
Onyambu

Reputation: 79188

data.frame(df)%>%
   reshape(3:ncol(df),sep="_",dir="long")%>%
   `rownames<-`(NULL)
  a b time  tmp pre id
1 1 2 2000 23.0 100  1
2 1 2 2001 22.1 103  1
3 1 2 2002 25.0 189  1

Upvotes: 2

AndS.
AndS.

Reputation: 8110

You could do it like this:

library(tidyverse)
df %>%
    gather(measure, value, -a, -b) %>% 
    separate(measure, into = c("type", "year"), sep = "_") %>% 
    mutate(type = case_when(type == "tmp" ~ "temp", type == "pre" ~ "precip")) %>% 
    spread(type, value)
#       a     b year  precip  temp
# 1     1     2 2000     100  23  
# 2     1     2 2001     103  22.1
# 3     1     2 2002     189  25  

We first gather all of the data in long format, then we separate the year from the measurement, then we change the names of the measurements, and lastly we spread the data back to wide format.

Upvotes: 2

Related Questions