Big Rick
Big Rick

Reputation: 166

R: using the mutate function to combine strings

I've got a dataset called data1 with headers year and count.

My sample data looks like this:

  Year  Count
1 2005  3000  
2 2006  4000 
3 2007  5000
4 2008  6000   

I add another column to the data which works out the yearly increase. This is my code:

data1growth <- data1 %>%
mutate(Growth = Count - lag(Count))

I want to be able to add another column called period so that I can get the following output:

  Year  Count  Growth  Period
1 2005  3000   NA      NA
2 2006  4000   1000    2005-2006
3 2007  5000   1000    2006-2007
4 2008  6000   1000    2007-2008

What code should I add to the mutate function to get the desired output, or am I off the mark completely? Any help is appreciated.

Thanks everyone.

Upvotes: 0

Views: 248

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101393

Here is a base R option

transform(df1,
  Grouth = c(NA, diff(Count)),
  Period = c(NA, paste0(Year[-nrow(df1)], "-", Year[-1]))
)

which gives

  Year Count Grouth    Period
1 2005  3000     NA      <NA>
2 2006  4000   1000 2005-2006
3 2007  5000   1000 2006-2007
4 2008  6000   1000 2007-2008

Upvotes: 0

s_baldur
s_baldur

Reputation: 33488

library(dplyr)
data1 %>%
  mutate(
    Growth = Count - lag(Count), 
    period = if_else(
      row_number() > 1, 
      paste0(lag(Year), "-", Year), 
      NA_character_
    )
  )

#   Year Count Growth    period
# 1 2005  3000     NA      <NA>
# 2 2006  4000   1000 2005-2006
# 3 2007  5000   1000 2006-2007
# 4 2008  6000   1000 2007-2008

Reproducible data

data1 <- data.frame(
  Year  = seq(2005L, 2008L, 1L),
  Count = seq(3000L, 6000L, 1000L) 
)

Upvotes: 1

csgroen
csgroen

Reputation: 2541

If you want 'Period' to just be a string, you can just use another mutate:

library(tidyverse)
data1 <- tibble(Year = 2005:2008, Count = c(3000, 4000, 5000, 6000))
data1growth <- data1 %>%
    mutate(Growth = Count - lag(Count))

# Period as string
data1growth %>%
    mutate(Period = paste0(Year, "-", Year-1))
#> # A tibble: 4 x 4
#>    Year Count Growth Period   
#>   <int> <dbl>  <dbl> <chr>    
#> 1  2005  3000     NA 2005-2004
#> 2  2006  4000   1000 2006-2005
#> 3  2007  5000   1000 2007-2006
#> 4  2008  6000   1000 2008-2007

# Period as string (don't include NA Growth)
data1growth %>%
    mutate(Period = ifelse(is.na(Growth), NA, paste0(Year, "-", Year-1)))
#> # A tibble: 4 x 4
#>    Year Count Growth Period   
#>   <int> <dbl>  <dbl> <chr>    
#> 1  2005  3000     NA <NA>     
#> 2  2006  4000   1000 2006-2005
#> 3  2007  5000   1000 2007-2006
#> 4  2008  6000   1000 2008-2007

Upvotes: 0

Related Questions