Edwin lee
Edwin lee

Reputation: 13

filtering row with latest year and converting internet speed in R

This is my first time posting a question in StackOverflow due to ongoing struggles with the following tasks:

  1. Trying to convert Gbps to Mbps for "Download.Speed" based on the conditional statement when Speed.Unit is "Gbps".

Getting error when using The Pipe Operator

Error in erate_data %>% filter(Speed.Unit == "Gbps") %>% as.numeric(erate_data$Download.Speed) : 
  'list' object cannot be coerced to type 'double'

When using ifelse statement, I would get a large list with

Warning message:
In ifelse(erate_data$Speed.Unit == "Gbps", as.numeric(erate_data$Download.Speed) *  :
  NAs introduced by coercion
  1. Trying to filter row with the latest funding year with a minimum Monthly.Cost/maximum download speed to filter out the duplicate fields for the same year. I get no error message but 0 obs on data.
library(dplyr)

erate_data <- read.csv('E-Rate_Details.csv', stringsAsFactors = FALSE)

#convert gbps to mbps trial 1

gbps_mbps <- erate_data %>%

  filter(Speed.Unit == "Gbps") %>%

  as.numeric(erate_data$Download.Speed) * 1024

#convert gbps to mbps trial 2

gbps_mbps <- ifelse(erate_data$Speed.Unit == "Gbps", as.numeric(erate_data$Download.Speed) * 1024, erate_data)

# filter latest year with lowest FRN monthly cost

library_latest <- 

  erate_data %>% 

  filter(Funding.Year == max(Funding.Year) & Monthly.Cost == min(Monthly.Cost))

Any help/guidance will be much appreciated. attached screenshot for the reference

Input

dput(erate_data)

structure(list(Entity.Name = c("115TH STREET BRANCH LIBRARY", "115th Street Branch Library", "125th Street Branch Library", "320th Federal Way Library", "320th Federal Way Library", "53rd Street Library", "81ST AVENUE BRANCH LIBRARY", "81ST AVENUE BRANCH LIBRARY", "81ST AVENUE BRANCH LIBRARY"), Zip.Code = c(10026L, 10026L, 10035L, 98003L, 98003L, 10019L, 94621L, 94621L, 94621L), Funding.Year = c(2016L, 2019L, 2019L, 2019L, 2019L, 2019L, 2016L, 2017L, 2017L), Download.Speed = c(40, 200, 200, 100, 1, 1, 50, 1.544, 1.544), Speed.Unit = c("Mbps", "Mbps", "Mbps", "Mbps", "Gbps", "Gbps", "Mbps", "Mbps", "Mbps" ), Monthly.Cost = c("1,365", "1,207.50", "1,207.50", "876", "1,380", "2,126.25", "961.01", "26.12", "158.5")), class = "data.frame", row.names = c(NA, -9L))

Desired Output

dput(erate_data)

structure(list(Entity.Name = c("115th Street Branch Library", "125th Street Branch Library", "320th Federal Way Library", "53rd Street Library", "81ST AVENUE BRANCH LIBRARY"), Zip.Code = c(10026L, 10035L, 98003L, 10019L, 94621L), Funding.Year = c(2019L, 2019L, 2019L, 2019L, 2017L), Download.Speed = c(200, 200, 100, 1024, 1.544), Speed.Unit = c("Mbps", "Mbps", "Mbps", "Mbps", "Mbps"), Monthly.Cost = c("1,207.50", "1,207.50", "876", "2,126.25", "26.12")), row.names = c(NA, 5L), class = "data.frame", na.action = structure(6:20, names = c("6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20"), class = "omit"))

ideal output input

Upvotes: 0

Views: 39

Answers (1)

jared_mamrot
jared_mamrot

Reputation: 26505

Thanks for editing your question to include an MRE. Two issues stand out to me and might be the cause of your problem: the "Monthly.Cost" column uses commas as a 'thousands' separator (e.g. "1,234.00"). In order to use these values for sorting your dataframe, you need R to interpret these as numbers. There is a function in the readr package called parse_number() which can handle the conversion from "1,234.00" to "1234.00". Readr is part of the tidyverse (gets loaded when you load the tidyverse package). The other issue is the "Entity.Name"'s have different cases i.e. all upper case vs Sentence case. One way to address this is to convert all of the names to upper case (toupper() function), but this may or may not be suitable depending on your use-case (up to you).

Here is a potential solution that I hope solves your issues:

library(tidyverse)
erate_data <- structure(list(Entity.Name = c("115TH STREET BRANCH LIBRARY", "115th Street Branch Library", "125th Street Branch Library", "320th Federal Way Library", "320th Federal Way Library", "53rd Street Library", "81ST AVENUE BRANCH LIBRARY", "81ST AVENUE BRANCH LIBRARY", "81ST AVENUE BRANCH LIBRARY"), Zip.Code = c(10026L, 10026L, 10035L, 98003L, 98003L, 10019L, 94621L, 94621L, 94621L), Funding.Year = c(2016L, 2019L, 2019L, 2019L, 2019L, 2019L, 2016L, 2017L, 2017L), Download.Speed = c(40, 200, 200, 100, 1, 1, 50, 1.544, 1.544), Speed.Unit = c("Mbps", "Mbps", "Mbps", "Mbps", "Gbps", "Gbps", "Mbps", "Mbps", "Mbps" ), Monthly.Cost = c("1,365", "1,207.50", "1,207.50", "876", "1,380", "2,126.25", "961.01", "26.12", "158.5")), class = "data.frame", row.names = c(NA, -9L))

erate_data %>%
  mutate(Monthly.Cost = parse_number(Monthly.Cost)) %>%
  mutate(Download.Speed = ifelse(Speed.Unit == "Gbps",
                                 Download.Speed * 1024, 
                                 Download.Speed)) %>%
  select(-Speed.Unit) %>%
  group_by(toupper(Entity.Name)) %>%
  slice_max(order_by = desc(Monthly.Cost))
#> # A tibble: 5 × 6
#> # Groups:   toupper(Entity.Name) [5]
#>   Entity.Name Zip.Code Funding.Year Download.Speed Monthly.Cost `toupper(Entit…`
#>   <chr>          <int>        <int>          <dbl>        <dbl> <chr>           
#> 1 115th Stre…    10026         2019         200          1208.  115TH STREET BR…
#> 2 125th Stre…    10035         2019         200          1208.  125TH STREET BR…
#> 3 320th Fede…    98003         2019         100           876   320TH FEDERAL W…
#> 4 53rd Stree…    10019         2019        1024          2126.  53RD STREET LIB…
#> 5 81ST AVENU…    94621         2017           1.54         26.1 81ST AVENUE BRA…

Created on 2022-07-12 by the reprex package (v2.0.1)

Upvotes: 0

Related Questions