Reputation: 415
I have a filter()
function which filters for specific Regex in a tibble. Because I need to do this more than one time I want to write a as_mapper()
function to end up with shorter code. How can I do this?
I have tried the following :
adverts <- as_mapper(~!grepl("(xtm)|((k|K)(i|I|1|11)(d|D)(n|N).)|(Ar<e)\\s(you)\\s(in)|
(LOAN)|(AR(\\s|\\S)[0-9])|((B|b)(i|1|l)tc.)|(Coupon)|(Plastic.King)|(organs)|(SILI)|(Electric.Cigarette.Machine)",.$value,perl = T)%>% filter)
If I try to add this function to a tibble, R throws me a C stack usage x is too close to the limit
error. How can I avoid this?
One of the tibble which I want to check can be generated with the following code :
library(tidyverse)
library(rvest)
library(textreadr)
bribe <- read_html(paste("http://ipaidabribe.com/reports/paid?page", 10, sep = "="))
all.nodes <- c(".heading-3 a",".paid-amount span", ".date", ".location", ".transaction a")
test <- map(all.nodes, ~ html_nodes(bribe, .x) %>% html_text()) %>%
unlist %>%
as_tibble
adverts(test)
Upvotes: 0
Views: 103
Reputation: 2161
EDIT for a shorter code
You can write a simple one-line function that does not rely on as_mapper
:
target_regex <- "(xtm)|((k|K)(i|I|1|11)(d|D)(n|N).)|(Ar<e)\\s(you)\\s(in)|
(LOAN)|(AR(\\s|\\S)[0-9])|((B|b)(i|1|l)tc.)|(Coupon)|(Plastic.King)|(organs)|(SILI)|(Electric.Cigarette.Machine)"
adverts <- function(df, col) df[!grepl(target_regex, df[[col]],perl = T), ]
test_df %>% adverts(col = "value")
This will return only the lines of df for which the regex is not found
I don't think you need to use a mapper here, you can simply build a normal function that would take as input a tibble, a target regex and return that tibble with an added column giving the result of grepl
. One possibility is :
filter_regex <- function(df, regex, col){
df %>%
mutate(found = grepl(pattern = regex, x = df[[col]])) %>%
filter(found == TRUE) %>%
select(-found)
}
test_df <- map(all.nodes, ~ html_nodes(bribe, .x) %>% html_text()) %>%
unlist %>%
as_tibble
target_regex <- "(xtm)|((k|K)(i|I|1|11)(d|D)(n|N).)|(Ar<e)\\s(you)\\s(in)|
(LOAN)|(AR(\\s|\\S)[0-9])|((B|b)(i|1|l)tc.)|(Coupon)|(Plastic.King)|(organs)|(SILI)|(Electric.Cigarette.Machine)"
filter_regex(test_df, target_regex, "value")
> # A tibble: 7 x 1
> value
> <chr>
> 1 "\r\n Kidney Donor Needed Urgently Needed\r\n "
> 2 "\r\n Kidney Donor Needed Urgetly \r\n "
> 3 "\r\n Urgent Kidney Donor Needed\r\n "
> 4 "\r\n Urgent Kidney Donor Needed\r\n "
> 5 "\r\n Kidney Donor Needed\r\n "
> 6 "\r\n Kidney Donor Needed\r\n "
> 7 "\r\n Kidney donation urgently needed in India for 7 CR\r\n
Upvotes: 1