Reputation: 275
I'd like to use str_extract_all
from the stringr
package to extract digits from strings, and I'd like the output as numerics in a column of an existing dataframe. The problem is that the str_extract_all
output's a list.
UPDATE: My overall goal is to use the extracted digits to filter the columns of another dataframe called film_main
. film_main
is where the data in the originally posted dataframe film
comes from.
So, if a column in film_main
has the digits 1 and 0 in the column name, then the only permitted entries in that column are 1s, 0s, and NAs. Any other entry in that column should be set to NA. See below pseudocode and film_main_desired
.
Sorry for not being very clear in my original post, I thought less was more but I ended up not doing a very good job presenting my problem.
# Load package
library(stringr)
# Toy dataset
film_main = data.frame("grey..0..yellow..1.."=c(0, 1, 0, NA, 2), "grey..0..brown..1.."=c(3, 0, 0, NA, 2), "grey..0..blue..1...brown..2.."=c(0, 2, 1, 6, 1), "3grey..0..purple..1...brown..2.."=c(0, 1, 2, 3, NA), "3grey..0..purple..1...brown..2..brown..3.."=c(0, 1, 2, 3, NA))
# Extracting digits using stringr::str_extract_all
film = data.frame(var = names(film_main))
film$var2 = str_extract_all(film$var, "[:digit:]+")
# Result for string extraction
class(film$var2)
"list"
# Desired result for string extraction
class(film$var2)
"numeric"
# Filtering film_main - PSEUDOCODE
lapply(film_main, function(x) ifelse(film_main$x %in% SOME_SORT_OF_A_FILTER_FEATURING_PERMITTED_DIGITS, df_main$x, NA))
# OVERALL GOAL
film_main_desired= data.frame("grey..0..yellow..1.."=c(0, 1, 0, NA, NA), "grey..0..brown..1.."=c(NA, 0, 0, NA, NA), "grey..0..blue..1...brown..2.."=c(0, 2, 1, NA, 1), "3grey..0..purple..1...brown..2.."=c(0, 1, 2, 3, NA), "3grey..0..purple..1...brown..2..brown..3.."=c(0, 1, 2, 3, NA))
Thanks for any help!
Upvotes: 0
Views: 2924
Reputation: 69
if your strings are all next to each other then this is just a bit shorter:
library(dplyr)
library(tidyr)
film2 <- film %>%
mutate(var2 = str_extract(var1, "[:digit:]+"),
var2 = as.numeric(var2))
Upvotes: 0
Reputation: 188
Is this what you're after? Using a couple of other tidyverse packages - dplyr and tidyr - alongside stringr.
library(dplyr)
library(tidyr)
film2 <- film %>%
mutate(var2 = str_extract_all(var1, "[:digit:]+")) %>%
unnest() %>%
mutate(var2 = as.numeric(var2))
Upvotes: 2