Reputation: 21

R: Extracting numerical values from strings in a column

I am interested in 1 specific column of a dataframe, where each row contains the name of a neighborhood and a specific number assigned to that neighborhood.

TOR - HOOD - Banbury-Don Mills (42) ( 23.6%)

Please see this image for a better understanding neighborhoodnum

I only want to extract the first bracketed numbers. TOR - HOOD - Alderwood (20) ( 25.4%)

I have tried using the stringr package but all the functions only take 1 string at a time. There are 140 rows in this column and I want the values from all the row. I am not sure how to go through every string in the column

Here is what I have tried and the results

and some code I used but got this error (Error in UseMethod("type") : no applicable method for 'type' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')")

hood_data<-tibble(hood=demo_edu_dataset$Geography)
head(hood_data)

hoodnum<-hood_data %>%
  #separate(hood, into= c("name", "number"), sep = "")
  stringr::str_extract_all(hood_data, "\\d")

Thank You

Upvotes: 0

Answers (3)

Chris Ruehlemann

Reputation: 21400

Or use str_extract from the stringr package as well as positive lookbehind and lookahead:

str_extract(YOURDATA, "(?<=\\()\\d{1,}(?=\\))")

This regex says: "when you see ( on the left and )on the right, match the number with at least 1 digit in the middle". If you wrap as.numeric around the whole expression, the numbers are converted from character to numeric:

as.numeric(str_extract(df$X, "(?<=\\()\\d{1,}(?=\\))"))

Upvotes: 0

Faria Khandaker

Reputation: 21

hoodnum<-hood_data %>%
 separate(Geography, into= c("name", "number"), sep = "\\(")

This worked

Upvotes: 1

ThomasIsCoding

Reputation: 101247

Maybe you can try gsub like below, for example

df <- data.frame(X = c("TOR - HOOD - Alderwood (20) ( 25.4%)",
                       "TOR - HOOD - Annex (95) ( 27.9%)"))

df$Y <- as.numeric(gsub(".*?\\((\\w+)\\).*","\\1",df$X))

such that

> df
                                     X  Y
1 TOR - HOOD - Alderwood (20) ( 25.4%) 20
2     TOR - HOOD - Annex (95) ( 27.9%) 95

Upvotes: 0

R: Extracting numerical values from strings in a column

Answers (3)

Related Questions