Nat
Nat

Reputation: 79

Merge and replace character columns

I have a dataframe with 2 columns that I want to merge:

     Region             PA
1     Mbeya    Ruaha National Park
2     Mbeya    Ruaha National Park
3     Mbeya    Ruaha National Park
4     Mbeya    Ruaha National Park
5     Mbeya    Ruaha National Park
6     Mbeya    Ruaha National Park
7     Mbeya    NA
8     Mbeya    NA
9     Mbeya    NA
10    Mbeya    NA

This could be merged by either taking PA values and overwriting Region values in the rows, or replacing all NA's in PA with the values in Region for that row.

I've tried:

  Carcass.cleaned$New<-rowSums(Carcass.cleaned[, c("PA", "Region")], na.rm=T)
    Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) : 
      'x' must be numeric
    
    with(Carcass.cleaned,ifelse(is.na(PA),Region,PA))
    (returns list of numbers)
    
    and coalesce(Carcass.cleaned$PA, Carcass.cleaned$Region) 
    unite(Carcass.cleaned, new, PA:Region, sep='') 
 (both merge the columns names instead of replacing)

Upvotes: 2

Views: 59

Answers (4)

Ronak Shah
Ronak Shah

Reputation: 388797

You can replace for NA values in PA with corresponding Region value.

df$PA[is.na(df$PA)] <- df$Region[is.na(df$PA)]
df
#   Region                  PA
#1   Mbeya Ruaha National Park
#2   Mbeya Ruaha National Park
#3   Mbeya Ruaha National Park
#4   Mbeya Ruaha National Park
#5   Mbeya Ruaha National Park
#6   Mbeya Ruaha National Park
#7   Mbeya               Mbeya
#8   Mbeya               Mbeya
#9   Mbeya               Mbeya
#10  Mbeya               Mbeya

Upvotes: 0

akrun
akrun

Reputation: 886938

We could use coalesce from dplyr

library(dplyr)
df %>%
   mutate(PA = coalesce(PA, Region))
#   Region                  PA
#1   Mbeya Ruaha National Park
#2   Mbeya Ruaha National Park
#3   Mbeya Ruaha National Park
#4   Mbeya Ruaha National Park
#5   Mbeya Ruaha National Park
#6   Mbeya Ruaha National Park
#7   Mbeya               Mbeya
#8   Mbeya               Mbeya
#9   Mbeya               Mbeya
#10  Mbeya               Mbeya

Or using fcoalesce in data.table

library(data.table)
setDT(df)[, PA := fcoalesce(PA, Region)]

data

df <- structure(list(Region = c("Mbeya", "Mbeya", "Mbeya", "Mbeya", 
"Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya"), PA = c("Ruaha National Park", 
"Ruaha National Park", "Ruaha National Park", "Ruaha National Park", 
"Ruaha National Park", "Ruaha National Park", NA, NA, NA, NA)), row.names = c(NA, 
-10L), class = "data.frame")

Upvotes: 1

Duck
Duck

Reputation: 39585

Try with mutate() from dplyr:

library(tidyr)
library(dplyr)
#Code
df <- df %>% group_by(Region) %>% 
  mutate(PA=ifelse(is.na(PA),Region,PA))

Output:

# A tibble: 10 x 2
# Groups:   Region [1]
   Region PA                 
   <chr>  <chr>              
 1 Mbeya  Ruaha National Park
 2 Mbeya  Ruaha National Park
 3 Mbeya  Ruaha National Park
 4 Mbeya  Ruaha National Park
 5 Mbeya  Ruaha National Park
 6 Mbeya  Ruaha National Park
 7 Mbeya  Mbeya              
 8 Mbeya  Mbeya              
 9 Mbeya  Mbeya              
10 Mbeya  Mbeya       

Some data used:

#Data
df <- structure(list(Region = c("Mbeya", "Mbeya", "Mbeya", "Mbeya", 
"Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya", "Mbeya"), PA = c("Ruaha National Park", 
"Ruaha National Park", "Ruaha National Park", "Ruaha National Park", 
"Ruaha National Park", "Ruaha National Park", NA, NA, NA, NA)), row.names = c(NA, 
-10L), class = "data.frame")

Upvotes: 2

peter
peter

Reputation: 786

You can use a simple if else statement:

df$Region <- ifelse(is.na(df$PA), df$Region, df$PA)

Basically wherever PA is NA you leave Region untouched, and where PA has a value you overwrite the value in Region. Afterwords you can delete PA if you want

Upvotes: 2

Related Questions