Vegard Dyran
Vegard Dyran

Reputation: 79

Merge two data frames, but only include variables where there are no NAs

I have two data frames that I want to merge:

df1:

Date         Company    Return
1988-09-30   BELSHIPS   0.087
1988-10-31   BELSHIPS   0.021
1988-11-30   BELSHIPS   0.015
1988-12-30   BELSHIPS   -0.048
1988-09-30   GOODTECH   0.114
1988-10-31   GOODTECH   0.074
1988-11-30   GOODTECH   NA
1988-12-30   GOODTECH   NA
1988-09-30   LABOREMUS  -0.014
1988-10-31   LABOREMUS  0.024
1988-11-30   LABOREMUS  0.017
1988-12-30   LABOREMUS  0.021

df2:

Company
BELSHIPS
BIK BOK
FARSTAD SHIPPING
GOODTECH
GYLDENDAL

I want to merge the two data frames by Company, but I only want to include companies that have no NAs in return. The new data frame should therefore look like this:

df3:

Date         Company    Return
1988-09-30   BELSHIPS   0.087
1988-10-31   BELSHIPS   0.021
1988-11-30   BELSHIPS   0.015
1988-12-30   BELSHIPS   -0.048

Only the company BELSHIPS is included, because GOODTECH has NAs in Return and LABOREMUS is not included in df2.

I have tried df3 <- merge(df2, df1[!is.na(df1$Return)], by = "Company") which doesn't work, because it only omits the rows with NAs, not the entire company.

Any suggestions as to how I can fix this?

Upvotes: 1

Views: 71

Answers (4)

Rohit parihar
Rohit parihar

Reputation: 46

simply merge it and then use the function na.omit(merged df)

Upvotes: 0

tbradley
tbradley

Reputation: 2280

You can also do this using dplyr:

df2 %>%
  left_join(df1, by = "Company") %>%
  group_by(Company) %>% 
  filter(sum(is.na(Return)) == 0)

which gives you:

# A tibble: 4 x 3
# Groups:   Company [1]
   Company       Date Return
     <chr>     <fctr>  <dbl>
1 BELSHIPS 1988-09-30  0.087
2 BELSHIPS 1988-10-31  0.021
3 BELSHIPS 1988-11-30  0.015
4 BELSHIPS 1988-12-30 -0.048

Upvotes: 0

Andrew Haynes
Andrew Haynes

Reputation: 2640

Test Data:

df2 = data.frame(Company = c('BELSHIPS','GOODTECH'))                                                                                                                                                                             
df1 = data.frame(Company = c('BELSHIPS','BELSHIPS','BELSHIPS','GOODTECH','GOODTECH','GOODTECH','LABOREMUS','LABOREMUS','LABOREMUS'),Return = c(1,2,3,1,NA,NA,3,4,5) )                                                                                                                                                                             

Using which() and unique() to grab Companies with NA rows:

df3<-merge(df2, df1[df1$Company!=unique(df1[which(is.na(df1$Return)),'Company']),], by = 'Company')                                                                                                                                                                        

Upvotes: 1

pogibas
pogibas

Reputation: 28339

Base R solution:

# Select companies that have NA
# You can also use unique on this
foo <- df1$Company[is.na(df1$Return)]
# Subset data frame where Company is within df2 and doesn't have NA
subset(df1, Company %in% df2$Company & !Company %in% foo)

#         Date  Company Return
# 1 1988-09-30 BELSHIPS  0.087
# 2 1988-10-31 BELSHIPS  0.021
# 3 1988-11-30 BELSHIPS  0.015
# 4 1988-12-30 BELSHIPS -0.048

Upvotes: 2

Related Questions