Reputation: 4338
I have a data frame that looks like this:
> sample
# A tibble: 6 x 10
Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Supplier
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <chr>
1 1 2 3 4 8 NA NA NA NA orioles
2 1 2 3 4 9 13 NA NA NA nationals
3 1 2 3 5 10 14 16 18 NA dodgers
4 1 2 3 5 10 14 17 19 NA cardinals
5 1 2 3 6 11 NA NA NA NA giants
6 1 2 3 7 12 15 NA NA NA padres
What I'd like to do is concatenate the Supplier column with any Level column if all the values between them are NA
. Another way I was thinking about this was that if the column to the right of the Level column is NA
then to concatenate that column with the supplier column.
I was thinking a for loop but I haven't figured out how to implement the logic. The logic I was thinking is something like:
for (level in levels) {
if is.na(level n + 1) {
paste0(level, Supplier)
}
else {
level}
}
I could also do a bunch of mutate
calls like this but it seems super repetitive and unnecessary:
sample %>%
mutate(
Level_5 = ifelse(
is.na(Level_6),
paste0(Supplier, "<br>", Level_5),
Level_5)
)
Here's the dput of the data:
structure(list(Level_1 = c(1, 1, 1, 1, 1, 1), Level_2 = c(2,
2, 2, 2, 2, 2), Level_3 = c(3, 3, 3, 3, 3, 3), Level_4 = c(4,
4, 5, 5, 6, 7), Level_5 = c(8, 9, 10, 10, 11, 12), Level_6 = c(NA,
13, 14, 14, NA, 15), Level_7 = c(NA, NA, 16, 17, NA, NA), Level_8 = c(NA,
NA, 18, 19, NA, NA), Level_9 = c(NA, NA, NA, NA, NA, NA), Supplier = c("orioles",
"nationals", "dodgers", "cardinals", "giants", "padres")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 4
Views: 170
Reputation: 21938
Final Update
I realized my mistake on trying to find the max value in every row and replace it with desired concatenated string. So I came up with another solution which only replaces the last non-NA
value (it can also be not the max values of the row), given all values are not numeric. So here is my final solution:
library(dplyr)
library(stringr)
library(purrr)
df %>%
pmap_dfr(., ~ {x <- c(...)[-10][!is.na(c(...)[-10])];
ind <- which(c(...) == x[length(x)]);
replace(c(...), ind[length(ind)], str_c(..10, x[length(x)], sep = "_"))}
)
# A tibble: 6 x 10
Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Supplier
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 2 3 4 orioles_8 NA NA NA NA orioles
2 1 2 3 4 9 nationals_13 NA NA NA nationals
3 1 2 3 5 10 14 16 dodgers_18 NA dodgers
4 1 2 3 5 10 14 17 cardinals_19 NA cardinals
5 1 2 3 6 giants_11 NA NA NA NA giants
6 1 2 3 7 12 padres_15 NA NA NA padres
Upvotes: 3
Reputation: 26238
Combined approach with c_across
and across
library(tidyverse)
df %>% rowwise() %>%
mutate(dummy = max(which(!is.na(c_across(starts_with('Level')))))) %>% ungroup() %>%
mutate(across(starts_with('Level_'),
~ifelse(as.numeric(str_remove(cur_column(), 'Level_')) == dummy, paste(Supplier, ., sep = '_'), .)))
# A tibble: 6 x 11
Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Supplier dummy
<dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <chr> <lgl> <chr> <int>
1 1 2 3 4 orioles_8 NA NA NA NA orioles 5
2 1 2 3 4 9 nationals_13 NA NA NA nationals 6
3 1 2 3 5 10 14 16 dodgers_18 NA dodgers 8
4 1 2 3 5 10 14 17 cardinals_19 NA cardinals 8
5 1 2 3 6 giants_11 NA NA NA NA giants 5
6 1 2 3 7 12 padres_15 NA NA NA padres 6
Combining which
strategy used above my friend Anoushiravan's answer can be simplified to:
purrr::pmap_dfr
do these
n
i
temp
variable of length n
and having a T
at i
and F
elsewherereplace
to replace i
th variable (using temp
) with desired valuesdf %>%
pmap_dfr(., ~ {n <- ncol(df); i <- max(which(!is.na(c(...)[-n])));
tmp <- rep(F, n); tmp[i] <- T;
replace(c(...), tmp, paste(c(...)[n], c(...)[i], sep = '_'))})
# A tibble: 6 x 10
Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Supplier
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 2 3 4 orioles_8 NA NA NA NA orioles
2 1 2 3 4 9 nationals_13 NA NA NA nationals
3 1 2 3 5 10 14 16 dodgers_18 NA dodgers
4 1 2 3 5 10 14 17 cardinals_19 NA cardinals
5 1 2 3 6 giants_11 NA NA NA NA giants
6 1 2 3 7 12 padres_15 NA NA NA padres
Upvotes: 2
Reputation: 79246
Very interesting question. Here is my approach without pivot_longer
:
library(dplyr)
# check is na
ind <- !is.na(df1)
# save vector who fullfill assumption value before first NA
values <- as.vector(tapply(df1[ind], row(df1)[ind], tail, 1))
# bind to dataframe
df2 <- cbind(df, values)
# accomplish the task
df2 %>%
mutate(across(Level_1:Level_9, ~ case_when(. == values ~ str_c(Supplier ,.),
. != values ~ as.character(.)))) %>%
select(-values)
Output:
Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9 Supplier
1 1 2 3 4 orioles8 <NA> <NA> <NA> <NA> orioles
2 1 2 3 4 9 nationals13 <NA> <NA> <NA> nationals
3 1 2 3 5 10 14 16 dodgers18 <NA> dodgers
4 1 2 3 5 10 14 17 cardinals19 <NA> cardinals
5 1 2 3 6 giants11 <NA> <NA> <NA> <NA> giants
6 1 2 3 7 12 padres15 <NA> <NA> <NA> padres
Upvotes: 2
Reputation: 11546
Another approach:
library(tidyr)
library(dplyr)
df %>% mutate(across(contains('Level'), ~ as.character(.))) %>%
mutate(across(contains('Level'), ~ coalesce(., Supplier))) %>% select(-Supplier) %>%
mutate(ID = row_number()) %>%
pivot_longer(cols = -ID) %>% group_by(ID) %>%
mutate(value = case_when(duplicated(value) ~ NA_character_, TRUE ~ value)) %>% pivot_wider(names_from = name, values_from = value) %>%
ungroup() %>% select(-ID)
# A tibble: 6 x 9
Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8 Level_9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 2 3 4 8 orioles NA NA NA
2 1 2 3 4 9 13 nationals NA NA
3 1 2 3 5 10 14 16 18 dodgers
4 1 2 3 5 10 14 17 19 cardinals
5 1 2 3 6 11 giants NA NA NA
6 1 2 3 7 12 15 padres NA NA
Upvotes: 2
Reputation: 16998
To be honest, I'm not 100% sure about your desired output. Using dplyr
and tidyr
:
library(tidyr)
library(dplyr)
sample %>%
pivot_longer(cols=starts_with("Level_"), names_prefix="Level_", names_to="level") %>%
drop_na() %>%
group_by(Supplier) %>%
mutate(new_val=ifelse(level==max(level), paste0(Supplier, "<br>", value), value)) %>%
select(-value) %>%
pivot_wider(names_from=level, names_prefix="Level_", values_from=new_val)
returns
# A tibble: 6 x 9
# Groups: Supplier [6]
Supplier Level_1 Level_2 Level_3 Level_4 Level_5 Level_6 Level_7 Level_8
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 orioles 1 2 3 4 orioles<br>8 NA NA NA
2 nationals 1 2 3 4 9 nationals<br>13 NA NA
3 dodgers 1 2 3 5 10 14 16 dodgers<br>18
4 cardinals 1 2 3 5 10 14 17 cardinals<br>19
5 giants 1 2 3 6 giants<br>11 NA NA NA
6 padres 1 2 3 7 12 padres<br>15 NA NA
I lost the Level_9 column since it contained only NA
. You can easily add it again.
Upvotes: 3