Nautica
Nautica

Reputation: 2018

Merging two columns with NAs and duplicated values

Subset of data frame:

             country1                 country2
                Japan                    Japan
          Netherlands                     <NA>
                 <NA>                     <NA>
               Brazil                   Brazil
   Russian Federation                     <NA>
                 <NA>                     <NA>
                 <NA> United States of America
              Germany                  Germany
              Ukraine                     <NA>
                Japan                    Japan
                 <NA>       Russian Federation
                 <NA> United States of America
               France                   France
          New Zealand              New Zealand
                Japan                     <NA>

I have two character vectors, country1 and country2, which I would like to merge together into a new column. No observations in my dataset have different countries. However, some pairs have duplicated values which I would like only to display once. There is also the issue of the NAs, which I want to omit in the merged column, where each value in the new column only has the country string. A few observations have NAs in both of my columns, which I just want to leave as NA in the new column. I'm wondering what the best way to tackle this would be.

I've made a minor modification to the function in the top voted answer here with a similar question, changing the seperation of commas to nothing.

However, this leaves the repeating issue unsolved:

             country1                 country2                                            merge
                Japan                    Japan                                       JapanJapan
          Netherlands                     <NA>                                      Netherlands
                 <NA>                     <NA>                                             <NA>
               Brazil                   Brazil                                     BrazilBrazil
   Russian Federation                     <NA>                               Russian Federation
                 <NA>                     <NA>                                             <NA>
                 <NA> United States of America                         United States of America
              Germany                  Germany                                   GermanyGermany
              Ukraine                     <NA>                                          Ukraine
                Japan                    Japan                                       JapanJapan
                 <NA>       Russian Federation                               Russian Federation
                 <NA> United States of America                         United States of America
               France                   France                                     FranceFrance
          New Zealand              New Zealand                           New ZealandNew Zealand
                Japan                     <NA>                                            Japan

Upvotes: 0

Views: 44

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47320

You can also just replace the NA values from 1st column with values from the 2nd :

df$country1[is.na(df$country1)] <- df$country2[is.na(df$country1)]

Upvotes: 1

Onyambu
Onyambu

Reputation: 79228

Since you said you have character vectors, then:

library(tidyverse)
coalesce(country1,country2)
 [1] "Japan"                    "Netherlands"              NA                        
 [4] "Brazil"                   "Russian Federation"       NA                        
 [7] "United States of America" "Germany"                  "Ukraine"                 
[10] "Japan"                    "Russian Federation"       "United States of America"
[13] "France"                   "New Zealand"              "Japan"   

if its a dataframe. Just do coalesce(!!!df)

Upvotes: 1

C. Braun
C. Braun

Reputation: 5201

Since you specified dplyr, here's a one-liner with it:

df <- dplyr::mutate(df, merge = dplyr::if_else(is.na(country1), country2, country1))

Data

country1 <- c("Japan", "Netherlands", NA, "Brazil", "Russian Federation", NA, NA, "Germany", "Ukraine", "Japan", NA, NA, "France", "New Zealand", "Japan")
country2 <- c("Japan", NA, NA, "Brazil", NA, NA, "United States of America", "Germany", NA, "Japan", "Russian Federation", "United States of America", "France", "New Zealand", NA)
df <- data.frame(country1, country2, stringsAsFactors = F)

Upvotes: 1

Related Questions