Remove rows based on data in another dataframe?

Question

I have right now a dataset with more than 186k observations (rows), this is presented in figure 1. These are all companies in BVDID column and they should contain data in all years of 2013 to 2017.

missingdata <- series %>% filter(LIABILITIES == 0) %>% select(BVDID)

However, I found 87k rows of only zero-values in missingdata object using the code above.

How do I delete the rows of the series object with BVDID (company code) in the dataframe missing data? Also there should be a way to make those years look better under my str(series) and put them ascending based on each company code.

Best regards

AnilGoyal · Accepted Answer

THERE are many ways, one such way.

use tidyverse anti_join function which gives the result as similar to set operation A-B and therefore will remove all matching rows from the second data.

series %>% anti_join(missingdata, by =c("BVDID" =  "BVDID"))

Or directly. Liabilities == 0 will return boolean values, adding + before it converts these to 0 or 1 and checking the sum of these values if greater than 1, which are to be removed.

series %>% group_by(BVDID)  %>% filter(sum(+(LIABILITIES == 0)) > 0)

Remove rows based on data in another dataframe?

Answers (2)

Related Questions