Reputation: 857
I want to create a country_year
variable that is conditioned on the occurrence of countries and years as shown below in this small subsample that i have created. This means that if i have 2 countries with 3 different years, a new country_year
variable will have the values of country1_year1
, country1_year2
, etc..
It seems so simple, but i am new to R and tried to look for different questions that target it with no success. Could someone guide me a bit please?
structure(list(id = structure(c(1, 1, 1, 2, 2, 2), format.stata = "%9.0g"),
country = structure(c("US", "US", "US", "UK", "UK", "UK"), format.stata = "%9s"),
year = structure(c(2003, 2004, 2005, 2003, 2004, 2005), format.stata = "%9.0g"),
country_year = structure(c(1, 2, 3, 4, 5, 6), format.stata = "%9.0g")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0
Views: 43
Reputation: 887741
An option with tidyverse
would be
library(dplyr)
library(tidyr)
df %>%
unite(country_year, country, year, sep="_", remove = FALSE)
-output
# A tibble: 6 x 4
# id country_year country year
# <dbl> <chr> <chr> <dbl>
#1 1 US_2003 US 2003
#2 1 US_2004 US 2004
#3 1 US_2005 US 2005
#4 2 UK_2003 UK 2003
#5 2 UK_2004 UK 2004
#6 2 UK_2005 UK 2005
Upvotes: 1
Reputation: 7413
It seems like you are wanting to make a new variable country_year
:
Using base R:
df$country_year <- paste0(df$country, "_", df$year)
Using dplyr:
library(dplyr)
df %>%
mutate(country_year = paste0(country,"_",year))
This gives us:
id country year country_year
<dbl> <chr> <dbl> <chr>
1 1 US 2003 US_2003
2 1 US 2004 US_2004
3 1 US 2005 US_2005
4 2 UK 2003 UK_2003
5 2 UK 2004 UK_2004
6 2 UK 2005 UK_2005
Upvotes: 1