Chris
Chris

Reputation: 2071

substr() takes a vector as a string, not the values of the vector as strings

I have a character vector like:

Variables <- c("EA10", "EA14", "EA15", "EA16", "EA19", "EA2", "EA21", "EA22", "EA24", "EA25", "EA28")

And Variables is a vector of a df. I want to extract from the third character in the vector above (specifically, extract the number) and I'm using this code:

df3["#Variable"] <- substr(df3["Variables"], start=2,stop=100)

However, and as you can see in the new #Variable vector, it takes the vector as a string, not the values of the vector as string: Why? How can I solve this?

   Variables       #Variable
2       EA10   c("EA10", "EA14", "EA15",
5       EA14   c("EA10", "EA14", "EA15",
6       EA15   c("EA10", "EA14", "EA15",
7       EA16   c("EA10", "EA14", "EA15",

Upvotes: 0

Views: 49

Answers (2)

RLave
RLave

Reputation: 8364

I want to extract from the third character in the vector above (specifically, extract the number)

I would use gsub() which finds a specific pattern and replaces it.

Variables <- c("EA10", "EA14", "EA15")
gsub(pattern="\\D", replacement="", Variables)
#[1] "10" "14" "15"
  • pattern="\\D" matches everythig that is not a digit (more here)

  • replacement="" replaces it with empty space


As an alternative you could of course extract directly the digits, using for example str_extract() from the stringr package:

stringr::str_extract(string = Variables, pattern = "\\d+") 
# \\d+ matches multiple digits in the string
#[1] "10" "14" "15"

Inside a data.frame:

df["Variable"] = gsub(pattern="\\D", replacement="", df["Variable"])

or:

df["Variable"] = stringr::str_extract(df["Variable"], pattern="\\d+")

Upvotes: 1

Joseph Clark McIntyre
Joseph Clark McIntyre

Reputation: 1094

When you reference df['Variables'], you're extracting a dataframe, not a vector, and substr doesn't know how to handle it. Use either df$Variables or df[['Variables']], as I show below.

df <- data.frame(Variables = c("EA10", "EA14", "EA15", "EA16", "EA19", "EA2", "EA21", "EA22", "EA24", "EA25", "EA28"))
substr(df[["Variables"]], start = 2, stop = 100)
[1] "A10" "A14" "A15" "A16" "A19" "A2"  "A21" "A22" "A24" "A25" "A28"

Upvotes: 1

Related Questions