Reputation: 79
I have several large data frames with one column (let's call it timeperiod
) containing text string variables. The variables all end in the same string (V.1to2
or V.2to3)
, but the beginnings are different.
I want the values with the same endings changed. As an example:
df <- data.frame (Location = c("a","b","c","d","e","f","g","h"),
timeperiod = c("A.V.1to2", "D.V.1to2", "A.V.1to2","D.V.2to3","A.V.3to4","H.V.3to4","A.V.4to5","D.V.4to5"))
As shown:
Location timeperiod
1 a A.V.1to2
2 b D.V.1to2
3 c A.V.1to2
4 d D.V.2to3
5 e A.V.3to4
6 f H.V.3to4
7 g A.V.4to5
8 h D.V.4to5
***(edit) There a few different data frame types and they all need the timeperiod column changed in a way that assigns numbers in place of the strings however, I oversimplified when I first ask this question. In some cases the number matches the first number in the string, but in other cases it does not match. Here is an updated expected outcome that reflects this situation. ***
My desired outcome would be as follows:
df2
Location timeperiod
1 a 0
2 b 0
3 c 0
4 d 2
5 e 3
6 f 3
7 g 5
8 h 5
df2 <- data.frame (Location = c("a","b","c","d","e","f","g","h"),
timeperiod = c(0, 0, 0, 2, 3, 3, 5, 5))
(edit) With this expected outcome you can see that there are situations where the numbers match the first number in the string but there are other situations in the data where it doesn't match. May apologies for not making this clear in the first example
Sample code:
df$timeperiod[df$timeperiod =="A.V.1to2"] <- "0"
Because of the size of my data set and the need to repeat this for multiple data frames with different prefixes for the time period values, I'd like to use dplyr
like this:
library(dplyr)
df$timeperiod <- revalue(df$timeperiod, c(ends_with(V.1to2)="0"))
df$timeperiod <- revalue(df$timeperiod, c(ends_with(V.2to3)="2"))
#etc..
So I can do it again with different values and sheets. But this doesn't work, and renaming every specific value seems inefficient, so any solution faster than this would serve its purpose.
Upvotes: 0
Views: 771
Reputation: 78927
We could use str_extract
:
library(dplyr)
library(stringr)
df %>%
mutate(timeperiod = str_extract(timeperiod, '\\d+'))
Location timeperiod
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
Upvotes: 1
Reputation: 496
Maybe this is what you are looking for
df <- data.frame (Location = c("a","b","c","d","e","f","g","h"),
timeperiod = c("A.V.1to2", "D.V.1to2", "A.V.1to2","D.V.2to3","A.V.3to4","H.V.3to4","A.V.4to5","D.V.4to5"))
df$timeperiod <- substr(gsub('[[:alpha:]]|[[:punct:]]', '', df$timeperiod), 1, 1)
df
Location timeperiod
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
Upvotes: 0
Reputation: 9858
We can use dplyr, and stringr. First extract the last 6 characters of timeperiod
. Then, group_by
timeperiod, and finally use cur_group_id
library(dplyr)
library(stringr)
df %>% mutate(timeperiod = str_extract(timeperiod, '.{6}$'))%>%
group_by(timeperiod)%>%
mutate(timeperiod = cur_group_id())%>%
ungroup()
# A tibble: 8 × 2
Location timeperiod
<chr> <int>
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
Upvotes: 0