Reputation: 21
I am interested in 1 specific column of a dataframe, where each row contains the name of a neighborhood and a specific number assigned to that neighborhood.
TOR - HOOD - Banbury-Don Mills (42) ( 23.6%)
Please see this image for a better understanding neighborhoodnum
I only want to extract the first bracketed numbers. TOR - HOOD - Alderwood (20) ( 25.4%)
I have tried using the stringr package but all the functions only take 1 string at a time. There are 140 rows in this column and I want the values from all the row. I am not sure how to go through every string in the column
Here is what I have tried and the results
and some code I used but got this error (Error in UseMethod("type") : no applicable method for 'type' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')")
hood_data<-tibble(hood=demo_edu_dataset$Geography)
head(hood_data)
hoodnum<-hood_data %>%
#separate(hood, into= c("name", "number"), sep = "")
stringr::str_extract_all(hood_data, "\\d")
Thank You
Upvotes: 0
Views: 95
Reputation: 21400
Or use str_extract
from the stringr
package as well as positive lookbehind and lookahead:
str_extract(YOURDATA, "(?<=\\()\\d{1,}(?=\\))")
This regex says: "when you see (
on the left and )
on the right, match the number with at least 1 digit in the middle".
If you wrap as.numeric
around the whole expression, the numbers are converted from character to numeric:
as.numeric(str_extract(df$X, "(?<=\\()\\d{1,}(?=\\))"))
Upvotes: 0
Reputation: 21
hoodnum<-hood_data %>%
separate(Geography, into= c("name", "number"), sep = "\\(")
This worked
Upvotes: 1
Reputation: 101247
Maybe you can try gsub
like below, for example
df <- data.frame(X = c("TOR - HOOD - Alderwood (20) ( 25.4%)",
"TOR - HOOD - Annex (95) ( 27.9%)"))
df$Y <- as.numeric(gsub(".*?\\((\\w+)\\).*","\\1",df$X))
such that
> df
X Y
1 TOR - HOOD - Alderwood (20) ( 25.4%) 20
2 TOR - HOOD - Annex (95) ( 27.9%) 95
Upvotes: 0