Tim Wilcox
Tim Wilcox

Reputation: 1331

How to account for NA's in R Dplyr

Below is the list of packages, sample data, and the script that I am running. Below that is the schema. You will notice that two of the values are above 500 and therefore do not fit the schema. The desired result would only take into account those that fit the schema (employing less than 500). When I run this on my larger data set(not the sample data set below), I get the a result that is like what is found at the bottom. In short, how would I modify the script so that it leaves out the entries that are greater than 500 and therefore does not return a fifth row of NA?

library(dplyr)
library(data.table)
library(odbc)
library(DBI)
library(stringr)

firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)

smbtest <- data.frame(firm,employment,small)

smbsummary2<-smbtest %>% 
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(), 
        .groups = 'drop') %>% 
mutate(employment = cumsum(employment),
     worksites = cumsum(worksites))

smb1     >= 0 and <100
smb2     >= 0 and <150
smb3     >= 0 and <250
smb4     >= 0 and <500

smb      employment   worksites
 1           1000         20
 2           1500         22
 3           2500         25
 4           10000        29
 5           25000        NA

Upvotes: 0

Views: 83

Answers (1)

Arslan Sh.
Arslan Sh.

Reputation: 145

here I believe this would help

firm <- c("firm1","firm2","firm3","firm4","firm5","firm6","firm7","firm8","firm9","firm10","firm11")
employment <- c(1,50,90,249,499,115,145,261,210,874,1140)
small <- c(1,1,1,3,4,2,2,4,3,NA,NA)

smbtest <- data.frame(firm,employment,small)

smbtest %>% 
select(employment,small) %>%
group_by(small) %>%
summarise(employment = sum(employment), worksites = n(), 
        .groups = 'drop') %>% 
 mutate(employment = cumsum(employment),
     worksites = cumsum(worksites)) %>% drop_na() %>% filter(employment < 500)

I've just added two lines of syntax

  • "drop_na"
  • "filter(employment < 500)

Upvotes: 1

Related Questions