user2327621
user2327621

Reputation: 997

R variable string replacement in a data frame

I have a dataframe that looks as follows:

df <- data.frame(one=c("s1_below_10", "s2_below_20"), 
                 two=c("s3_above_10","s4_above_10"))

I want to replace all the strings by the number preceding the first underscore. In other words, the desired output is

1   3
2   4

I would like to know how I can perform this replacement (the dataset is very large). Thanks for your help.

Upvotes: 2

Views: 2007

Answers (2)

thelatemail
thelatemail

Reputation: 93938

The basic gsub call would be something like:

gsub("^.+?(\\d+)_.+","\\1",df$one)
[1] "1" "2"

Which you could lapply to each column:

data.frame(lapply(df, gsub, pattern="^.+(\\d+)_.+",replacement= "\\1"))
  one two
1   1   3
2   2   4

Upvotes: 4

Marius
Marius

Reputation: 60180

If the values you want are always the second character of the string (which seems to be true of all your examples), you can do this with substr:

data.frame(lapply(df, substr, 2, 2))

Output:

  one two
1   1   3
2   2   4

Upvotes: 2

Related Questions