Reputation: 45
I have right now a dataset with more than 186k observations (rows), this is presented in figure 1. These are all companies in BVDID column and they should contain data in all years of 2013 to 2017.
missingdata <- series %>% filter(LIABILITIES == 0) %>% select(BVDID)
However, I found 87k rows of only zero-values in missingdata object using the code above.
How do I delete the rows of the series object with BVDID (company code) in the dataframe missing data? Also there should be a way to make those years look better under my str(series) and put them ascending based on each company code.
Best regards
Upvotes: 0
Views: 540
Reputation: 26238
THERE are many ways, one such way.
use tidyverse
anti_join
function which gives the result as similar to set operation A-B
and therefore will remove all matching rows from the second data.
series %>% anti_join(missingdata, by =c("BVDID" = "BVDID"))
Or directly. Liabilities == 0
will return boolean values, adding +
before it converts these to 0 or 1 and checking the sum of these values if greater than 1, which are to be removed.
series %>% group_by(BVDID) %>% filter(sum(+(LIABILITIES == 0)) > 0)
Upvotes: 1
Reputation: 400
series %>%
# filter out the BVDIDs from missingdata
filter(!BVDID %in% pull(missingdata)) %>%
# order the df
arrange(BVDID, year)
Upvotes: 0