Reputation: 1331
Below is the list of packages, sample data, and the script that I am running. Below that is the schema. You will notice that two of the values are above 500 and therefore do not fit the schema. The desired result would only take into account those that fit the schema (employing less than 500). When I run this on my larger data set(not the sample data set below), I get the a result that is like what is found at the bottom. In short, how would I modify the script so that it leaves out the entries that are greater than 500 and therefore does not return a fifth row of NA?
library(dplyr)
library(data.table)
library(odbc)
library(DBI)
library(stringr)
firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)
smbtest <- data.frame(firm,employment,small)
smbsummary2<-smbtest %>%
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(),
.groups = 'drop') %>%
mutate(employment = cumsum(employment),
worksites = cumsum(worksites))
smb1 >= 0 and <100
smb2 >= 0 and <150
smb3 >= 0 and <250
smb4 >= 0 and <500
smb employment worksites
1 1000 20
2 1500 22
3 2500 25
4 10000 29
5 25000 NA
Upvotes: 0
Views: 83
Reputation: 145
here I believe this would help
firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)
smbtest <- data.frame(firm,employment,small)
smbtest %>%
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(),
.groups = 'drop') %>%
mutate(employment = cumsum(employment),
worksites = cumsum(worksites)) %>% drop_na() %>% filter(employment < 500)
I've just added two lines of syntax
Upvotes: 1