Reputation: 53
I am having a dataframe with field x containing both group names (labeled as letters in the example below) and members of the group (listed under the group names, labeled as a number). I want to create a field that shows for each member the name of its group. In the dataframe below the desired output is shown in column "outcome".
df <- data.frame("x"=c("A","1","2","B","C","1","2","C","D","1"),
"outcome"=c("A","A","A","B","C","C","C","C","D","D")
) %>%
mutate(
Letter = ifelse(grepl("[A-Za-z]", x) == T,"Letter",
"No Letter")
)
My idea is to do this via a forloop. If x is a letter it should return that letter, if not it should return the outcome of the previous loop (which is the previous found letter in x). The forloop below doesn't give the right output:
df$outcome_calc[1] <- "A"
for (i in 2:10) {
df$outcome_calc[i] <- ifelse(df$Letter[i] == "No Letter",df$outcome_calc[i-1],df$x[i])
}
Any ideas how to get the right output?
Upvotes: 1
Views: 209
Reputation: 76402
Here are two tidyverse
ways, very similar, using the convenience function zoo::na.locf
.
First:
library(tidyverse)
df %>%
mutate(na = is.na(as.numeric(as.character(x))),
outcome2 = ifelse(na, as.character(x), NA_character_),
outcome2 = zoo::na.locf(outcome2)) %>%
select(-na)
Another one:
df %>%
mutate(chr = !grepl("[[:digit:]]", x),
outcome2 = ifelse(chr, as.character(x), NA_character_),
outcome2 = zoo::na.locf(outcome2)) %>%
select(-chr)
Upvotes: 2
Reputation: 42544
dplyr
Here is a stream-lined version of Rui's 2nd approach which doesn't require to create a temporary helper column. It uses stringr::str_detect()
, if_else()
, and zoo::na.locf()
.
library(dplyr)
df %>%
mutate(outcome2 = if_else(stringr::str_detect(x, "\\D"), x, factor(NA)) %>% zoo::na.locf())
x outcome Letter outcome2 1 A A Letter A 2 1 A No Letter A 3 2 A No Letter A 4 B B Letter B 5 C C Letter C 6 1 C No Letter C 7 2 C No Letter C 8 C C Letter C 9 D D Letter D 10 1 D No Letter D
data.table
For the sake of completeness, here is also data.table
approach which I have used frequently. It uses assignment by reference to update df
.
library(data.table)
setDT(df)[x %like% "\\D", outcome2 := x][, outcome2 := zoo::na.locf(outcome2)][]
x outcome Letter outcome2 1: A A Letter A 2: 1 A No Letter A 3: 2 A No Letter A 4: B B Letter B 5: C C Letter C 6: 1 C No Letter C 7: 2 C No Letter C 8: C C Letter C 9: D D Letter D 10: 1 D No Letter D
Upvotes: 0
Reputation: 5104
Using tidyr::fill
- requires NAs where your numbers were:
df = data.frame(x = c("A","1","2","B","C","1","2","C","D","1"),
stringsAsFactors = FALSE)
df$x[grepl("[0-9]+", df$x)] = NA
tidyr::fill(df, x)
x
1 A
2 A
3 A
4 B
5 C
6 C
7 C
8 C
9 D
10 D
Upvotes: 1
Reputation: 16178
Alternatively, you can avoid for
loop by using sapply
function.
You can define the position of your letters:
pos_letter <- grep("[A-Za-z]", df$x)
Then, use sapply
to 1) define for each row, the position of the letter right above and finally replaced each values by the corresponding letter:
df$out <- sapply(1:nrow(df),function(x) max(pos_letter[pos_letter <= x]))
df$out2 <- sapply(df$out, function(x) x = as.character(df[x,"x"]))
x outcome out out2
1 A A 1 A
2 1 A 1 A
3 2 A 1 A
4 B B 4 B
5 C C 5 C
6 1 C 5 C
7 2 C 5 C
8 C C 8 C
9 D D 9 D
10 1 D 9 D
You can combine both sapply
function in a single line by writing:
sapply(1:nrow(df), function(n) as.character(df[max(pos_letter[pos_letter <= n]),"x"]))
[1] "A" "A" "A" "B" "C" "C" "C" "C" "D" "D"
Upvotes: 1
Reputation: 21709
Here's a way to do this using for
loop:
# keeps track of previous letter
prev = ''
# output
op = c()
for (i in df$x){
# check the pattern
check = grepl(pattern = '[a-zA-Z]', x = i, ignore.case = T)
if(isTRUE(check)){
op = c(op, i)
prev = i
} else {
op = c(op, prev)
}
}
print(op)
[1] "A" "A" "A" "B" "C" "C" "C" "C" "D" "D"
Upvotes: 1