Robbie
Robbie

Reputation: 275

str_extract_all returns a list but I want a column in a dataframe

I'd like to use str_extract_all from the stringr package to extract digits from strings, and I'd like the output as numerics in a column of an existing dataframe. The problem is that the str_extract_all output's a list.

UPDATE: My overall goal is to use the extracted digits to filter the columns of another dataframe called film_main. film_main is where the data in the originally posted dataframe film comes from.

So, if a column in film_main has the digits 1 and 0 in the column name, then the only permitted entries in that column are 1s, 0s, and NAs. Any other entry in that column should be set to NA. See below pseudocode and film_main_desired.

Sorry for not being very clear in my original post, I thought less was more but I ended up not doing a very good job presenting my problem.

# Load package
library(stringr)

# Toy dataset
film_main = data.frame("grey..0..yellow..1.."=c(0, 1, 0, NA, 2), "grey..0..brown..1.."=c(3, 0, 0, NA, 2), "grey..0..blue..1...brown..2.."=c(0, 2, 1, 6, 1), "3grey..0..purple..1...brown..2.."=c(0, 1, 2, 3, NA), "3grey..0..purple..1...brown..2..brown..3.."=c(0, 1, 2, 3, NA))


# Extracting digits using stringr::str_extract_all
film = data.frame(var = names(film_main))
film$var2 = str_extract_all(film$var, "[:digit:]+")

# Result for string extraction
class(film$var2)
"list"    

# Desired result for string extraction 
class(film$var2)
"numeric"

# Filtering film_main - PSEUDOCODE 
lapply(film_main, function(x) ifelse(film_main$x %in% SOME_SORT_OF_A_FILTER_FEATURING_PERMITTED_DIGITS, df_main$x, NA))


# OVERALL GOAL 
film_main_desired= data.frame("grey..0..yellow..1.."=c(0, 1, 0, NA, NA), "grey..0..brown..1.."=c(NA, 0, 0, NA, NA), "grey..0..blue..1...brown..2.."=c(0, 2, 1, NA, 1), "3grey..0..purple..1...brown..2.."=c(0, 1, 2, 3, NA), "3grey..0..purple..1...brown..2..brown..3.."=c(0, 1, 2, 3, NA))

Thanks for any help!

Upvotes: 0

Views: 2924

Answers (2)

Wojty
Wojty

Reputation: 69

if your strings are all next to each other then this is just a bit shorter:

library(dplyr)
library(tidyr)

film2 <- film %>% 
  mutate(var2 = str_extract(var1, "[:digit:]+"),
  var2 = as.numeric(var2))

Upvotes: 0

NeilC
NeilC

Reputation: 188

Is this what you're after? Using a couple of other tidyverse packages - dplyr and tidyr - alongside stringr.

library(dplyr)
library(tidyr)

film2 <- film %>% 
  mutate(var2 = str_extract_all(var1, "[:digit:]+")) %>%
  unnest() %>%
  mutate(var2 = as.numeric(var2))

Upvotes: 2

Related Questions