Reputation: 2018
Subset of data frame:
country1 country2
Japan Japan
Netherlands <NA>
<NA> <NA>
Brazil Brazil
Russian Federation <NA>
<NA> <NA>
<NA> United States of America
Germany Germany
Ukraine <NA>
Japan Japan
<NA> Russian Federation
<NA> United States of America
France France
New Zealand New Zealand
Japan <NA>
I have two character vectors, country1
and country2
, which I would like to merge together into a new column. No observations in my dataset have different countries. However, some pairs have duplicated values which I would like only to display once. There is also the issue of the NAs, which I want to omit in the merged column, where each value in the new column only has the country string. A few observations have NAs in both of my columns, which I just want to leave as NA in the new column. I'm wondering what the best way to tackle this would be.
I've made a minor modification to the function in the top voted answer here with a similar question, changing the seperation of commas to nothing.
However, this leaves the repeating issue unsolved:
country1 country2 merge
Japan Japan JapanJapan
Netherlands <NA> Netherlands
<NA> <NA> <NA>
Brazil Brazil BrazilBrazil
Russian Federation <NA> Russian Federation
<NA> <NA> <NA>
<NA> United States of America United States of America
Germany Germany GermanyGermany
Ukraine <NA> Ukraine
Japan Japan JapanJapan
<NA> Russian Federation Russian Federation
<NA> United States of America United States of America
France France FranceFrance
New Zealand New Zealand New ZealandNew Zealand
Japan <NA> Japan
Upvotes: 0
Views: 44
Reputation: 47320
You can also just replace the NA
values from 1st column with values from the 2nd :
df$country1[is.na(df$country1)] <- df$country2[is.na(df$country1)]
Upvotes: 1
Reputation: 79228
Since you said you have character vectors, then:
library(tidyverse)
coalesce(country1,country2)
[1] "Japan" "Netherlands" NA
[4] "Brazil" "Russian Federation" NA
[7] "United States of America" "Germany" "Ukraine"
[10] "Japan" "Russian Federation" "United States of America"
[13] "France" "New Zealand" "Japan"
if its a dataframe. Just do coalesce(!!!df)
Upvotes: 1
Reputation: 5201
Since you specified dplyr
, here's a one-liner with it:
df <- dplyr::mutate(df, merge = dplyr::if_else(is.na(country1), country2, country1))
Data
country1 <- c("Japan", "Netherlands", NA, "Brazil", "Russian Federation", NA, NA, "Germany", "Ukraine", "Japan", NA, NA, "France", "New Zealand", "Japan")
country2 <- c("Japan", NA, NA, "Brazil", NA, NA, "United States of America", "Germany", NA, "Japan", "Russian Federation", "United States of America", "France", "New Zealand", NA)
df <- data.frame(country1, country2, stringsAsFactors = F)
Upvotes: 1