Enjo Faes
Enjo Faes

Reputation: 45

Remove rows based on data in another dataframe?

I have right now a dataset with more than 186k observations (rows), this is presented in figure 1. These are all companies in BVDID column and they should contain data in all years of 2013 to 2017.

enter image description here

missingdata <- series %>% filter(LIABILITIES == 0) %>% select(BVDID)

However, I found 87k rows of only zero-values in missingdata object using the code above.

enter image description here

How do I delete the rows of the series object with BVDID (company code) in the dataframe missing data? Also there should be a way to make those years look better under my str(series) and put them ascending based on each company code.

Best regards

Upvotes: 0

Views: 540

Answers (2)

AnilGoyal
AnilGoyal

Reputation: 26238

THERE are many ways, one such way.

use tidyverse anti_join function which gives the result as similar to set operation A-B and therefore will remove all matching rows from the second data.

series %>% anti_join(missingdata, by =c("BVDID" =  "BVDID")) 

Or directly. Liabilities == 0 will return boolean values, adding + before it converts these to 0 or 1 and checking the sum of these values if greater than 1, which are to be removed.

series %>% group_by(BVDID)  %>% filter(sum(+(LIABILITIES == 0)) > 0) 

Upvotes: 1

Radbys
Radbys

Reputation: 400

series %>% 
  # filter out the BVDIDs from missingdata
  filter(!BVDID %in% pull(missingdata)) %>% 
  # order the df 
  arrange(BVDID, year)

Upvotes: 0

Related Questions