parksnrec1
parksnrec1

Reputation: 79

Rename values in a single column that end with the same string

I have several large data frames with one column (let's call it timeperiod) containing text string variables. The variables all end in the same string (V.1to2 or V.2to3), but the beginnings are different.

I want the values with the same endings changed. As an example:

df <- data.frame (Location  = c("a","b","c","d","e","f","g","h"),
                   timeperiod = c("A.V.1to2", "D.V.1to2", "A.V.1to2","D.V.2to3","A.V.3to4","H.V.3to4","A.V.4to5","D.V.4to5"))

As shown:

  Location timeperiod
1        a   A.V.1to2
2        b   D.V.1to2
3        c   A.V.1to2
4        d   D.V.2to3
5        e   A.V.3to4
6        f   H.V.3to4
7        g   A.V.4to5
8        h   D.V.4to5

***(edit) There a few different data frame types and they all need the timeperiod column changed in a way that assigns numbers in place of the strings however, I oversimplified when I first ask this question. In some cases the number matches the first number in the string, but in other cases it does not match. Here is an updated expected outcome that reflects this situation. ***

My desired outcome would be as follows:

df2
  Location timeperiod
1        a          0
2        b          0
3        c          0
4        d          2
5        e          3
6        f          3
7        g          5
8        h          5

df2 <- data.frame (Location  = c("a","b","c","d","e","f","g","h"),
                  timeperiod = c(0, 0, 0, 2, 3, 3, 5, 5))


(edit) With this expected outcome you can see that there are situations where the numbers match the first number in the string but there are other situations in the data where it doesn't match. May apologies for not making this clear in the first example

Sample code:

df$timeperiod[df$timeperiod =="A.V.1to2"] <- "0"

Because of the size of my data set and the need to repeat this for multiple data frames with different prefixes for the time period values, I'd like to use dplyr like this:

library(dplyr)
df$timeperiod <- revalue(df$timeperiod, c(ends_with(V.1to2)="0"))
df$timeperiod <- revalue(df$timeperiod, c(ends_with(V.2to3)="2"))
#etc..

So I can do it again with different values and sheets. But this doesn't work, and renaming every specific value seems inefficient, so any solution faster than this would serve its purpose.

Upvotes: 0

Views: 771

Answers (3)

TarJae
TarJae

Reputation: 78927

We could use str_extract:

library(dplyr)
library(stringr)

df %>% 
  mutate(timeperiod = str_extract(timeperiod, '\\d+'))
  Location timeperiod
1        a          1
2        b          1
3        c          1
4        d          2
5        e          3
6        f          3
7        g          4
8        h          4

Upvotes: 1

Isa
Isa

Reputation: 496

Maybe this is what you are looking for

df <- data.frame (Location  = c("a","b","c","d","e","f","g","h"),
              timeperiod = c("A.V.1to2", "D.V.1to2", "A.V.1to2","D.V.2to3","A.V.3to4","H.V.3to4","A.V.4to5","D.V.4to5"))

df$timeperiod <- substr(gsub('[[:alpha:]]|[[:punct:]]', '', df$timeperiod), 1, 1)

df

  Location timeperiod
1        a          1
2        b          1
3        c          1
4        d          2
5        e          3
6        f          3
7        g          4
8        h          4

Upvotes: 0

GuedesBF
GuedesBF

Reputation: 9858

We can use dplyr, and stringr. First extract the last 6 characters of timeperiod. Then, group_by timeperiod, and finally use cur_group_id

library(dplyr)
library(stringr)

df %>% mutate(timeperiod = str_extract(timeperiod, '.{6}$'))%>%
    group_by(timeperiod)%>%
    mutate(timeperiod = cur_group_id())%>%
    ungroup()

# A tibble: 8 × 2
  Location timeperiod
  <chr>         <int>
1 a                 1
2 b                 1
3 c                 1
4 d                 2
5 e                 3
6 f                 3
7 g                 4
8 h                 4

Upvotes: 0

Related Questions