Amanda Goldberg
Amanda Goldberg

Reputation: 363

how to fill in empty data with data from another dataframe based on a column in R

I have two separate data frames and I need to use the 'B' data frame to fill in empty values in the 'A' data frame.

A<-data.frame(other=c('v','v','v','v','f','f'),
              site=c(1,2,3,4,5,6), 
              county=c('ab','bc','de','NA','NA','fg'), 
              stateprov=c(7,8,9,'NA','NA',10), 
              country=c('u','u','c','NA','NA','c'))

B<-data.frame(site=c(4,5), county=c('eh','gi'), 
              stateprov=c(11,12), country=c('u','u'))

#what I want

other site county stateprov country
v 1 ab 7 u
v 2 bc 8 u
v 3 de 9 c
v 4 eh 11 u
f 5 gi 12 u
f 6 fg 10 c

I'm not sure how fill in the NA's as join won't work in this case as far as I know. Thanks for your help

Upvotes: 0

Views: 2284

Answers (4)

langtang
langtang

Reputation: 24845

This is another approach

A = as.matrix(A)
B = as.matrix(merge(A[, c(1,2)], B, on="site",all.x=T)[,c(2,1,3,4,5)])
A[which(B!="NA")] <- B[which(B!="NA")]
data.frame(A) %>% dplyr::mutate(across(.cols = c(2,4),as.numeric))

Output:

  other site county stateprov country
1     v    1     ab         7       u
2     v    2     bc         8       u
3     v    3     de         9       c
4     v    4     eh        11       u
5     f    5     gi        12       u
6     f    6     fg        10       c

Upvotes: 0

Croote
Croote

Reputation: 1424

A join will work in this case. The data.table package especially does this well where you can update A by reference (see ?data.table).

Using data.table

A<-data.frame(other=c('v','v','v','v','f','f'),
              site=c(1,2,3,4,5,6), 
              county=c('ab','bc','de','NA','NA','fg'), 
              stateprov=c(7,8,9,'NA','NA',10), 
              country=c('u','u','c','NA','NA','c'))

B<-data.frame(site=c(4,5), county=c('eh','gi'), 
              stateprov=c(11,12), country=c('u','u'))

library(data.table)

setDT(A)
setDT(B)

# If you assume B always contains replacement for A
# refer to columns in B as i.[column_name], specify join column in on
A[B, c('county', 'stateprov', 'country') := .(i.county, i.stateprov, i.country), on = .(site)]

# Assuming you need to check if A is na first
# refer to columns in B as i.[column_name], refer to columns in A as x.[column name], specify join column in on

A[B, c('county', 'stateprov', 'country') := .(  fifelse(is.na(x.county), i.county, x.county)
                                              , fifelse(is.na(x.stateprov), i.stateprov, x.stateprov)
                                              , fifelse(is.na(x.country), i.country, x.country)
                                              )
                                              , on = .(site)]

A

   other site county stateprov country
1:     v    1     ab         7       u
2:     v    2     bc         8       u
3:     v    3     de         9       c
4:     v    4     eh        11       u
5:     f    5     gi        12       u
6:     f    6     fg        10       c

Another way using dplyr

library(dplyr)
# This also checks whether column in A is NA first
left_join(A, B, by = c('site')) %>% 
  mutate(
    county = case_when(is.na(county.x) ~ county.y, TRUE ~ county.x)
    , stateprov = case_when(is.na(stateprov.x) ~ stateprov.y, TRUE ~ stateprov.x)
    , country = case_when(is.na(country.x) ~ country.y, TRUE ~ country.x)
    ) %>% 
  select(-ends_with('.x'), -ends_with('.y'))
  
other site county stateprov country
1     v    1     ab         7       u
2     v    2     bc         8       u
3     v    3     de         9       c
4     v    4     eh        11       u
5     f    5     gi        12       u
6     f    6     fg        10       c

Upvotes: 0

Onyambu
Onyambu

Reputation: 79338

dplyr::rows_update(A, B)

  other site county stateprov country
1     v    1     ab         7       u
2     v    2     bc         8       u
3     v    3     de         9       c
4     v    4     eh        11       u
5     f    5     gi        12       u
6     f    6     fg        10       c

Upvotes: 2

Dave2e
Dave2e

Reputation: 24139

A simple loop should do it. Find the rows to replace, then loop through those rows and replace the A columns with B columns.

A<-data.frame(other=c('v','v','v','v','f','f'),
              site=c(1,2,3,4,5,6), 
              county=c('ab','bc','de','NA','NA','fg'), 
              stateprov=c(7,8,9,'NA','NA',10), 
              country=c('u','u','c','NA','NA','c'))

B<-data.frame(site=c(4,5), county=c('eh','gi'), 
              stateprov=c(11,12), country=c('u','u'))


rowToReplace <- which(A$county=="NA")

for(i in rowToReplace) {
    missingSite = A$site[i]
    A[i, c("county", "stateprov", "country")] <- B[which(B$site==missingSite), c("county", "stateprov", "country")]
}

Note, there is a difference between the string 'NA' and the NA value in R. The preference is to use NA instead of the string. NA has the advantage of working with the handy functions complete.cases() and is.na()

Upvotes: 0

Related Questions