Reputation: 95
I have some data that contains a set of variables that have a mix of numbers and characters, such as "1 - Completely Disagree." I would like to remove the characters and keep only the numbers, but retain the variables within the original dataset under their original name. So, I have a variable called "systematic1" with the value "1 - Completely Disagree. I would like it to become just "1" in numeric form within the original dataset. This what I have:
systematic1 | systematic2 | systematic3 |
---|---|---|
1 - Completely Disagree | 7 - Completely Agree | 7 - Completely Disagree |
5 - Somewhat Agree | 4 - Neither Agree nor Disagree | 6 - Agree |
This is the desired output:
systematic1 | systematic2 | systematic3 |
---|---|---|
1 | 7 | 7 |
5 | 4 | 6 |
I've been able to mostly accomplish this using the following code:
data %>%
select(systematic1:withdrawn) %>% #select range
select_if(is.character) %>% #keep only character vars
mutate_all(~parse_number(.,na=c("Not sure"))) #parse out number, treating "Not sure" as NA
But that produces a new dataframe with those transformed variables. I would like to keep the variables in the original dataset, only transformed.
Upvotes: 1
Views: 580
Reputation: 16978
You could use
library(dplyr)
library(stringr)
data %>%
mutate(
across(
systematic1:withdrawn & where(is.character),
~ifelse(str_detect(.x, "\\d+"), str_extract(.x, "\\d+"), "Not sure")
)
)
which returns
# A tibble: 2 x 3
systematic1 systematic2 systematic3
<chr> <chr> <chr>
1 1 7 7
2 5 4 6
Upvotes: 1