Reputation: 89
I have a data frame that looks like this:
line = c(1, 2, NA, 4 ,5, NA, 7)
group = c("1.0 Group A", "2.0 Group B", "3.0 Group C", "4.0 Group D", "5.0 Group E", "6.0 Group F", "7.0 Group G")
df <- data.frame(line, group)
view(df)
line group
1 1 1.0 Group A
2 2 2.0 Group B
3 NA 3.0 Group C
4 4 4.0 Group D
5 5 5.0 Group E
6 NA 6.0 Group F
7 7 7.0 Group G
What I want to do is to find all the NA value in the "line" column and place a row underneath that row in "group" column saying "Not Applicable". So that the new data frame should look like:
view(df)
line group
1 1 1.0 Group A
2 2 2.0 Group B
3 NA 3.0 Group C
4 NA Not Applicable
5 4 4.0 Group D
6 5 5.0 Group E
7 NA 6.0 Group F
8 NA Not Applicable
9 7 7.0 Group G
I am thinking about using an ifelse statement or using case_when from dplyr. But I don't know how to work it out. Does anyone have any suggestion?
Thank you!
Upvotes: 1
Views: 11615
Reputation: 115
I feel like the tidyr::uncount
function might also be what you're looking for. Just mark line==NA rows with a 2 in a new column, which we'll call n
, and then uncount
will duplicate each row based on the value in n
. By mutating an ifelse
that sets NA rows n
== 2, we essentially only duplicate the NA rows directly below where they were, as opposed to at the bottom of the df and thereby needing to arrange
. Finally, the dplyr::mutate_at
just says to set group
equal to "Not Applicable" if the line
and the previous row's line
(i.e. lag(line)) are both NA (which tells us to focus just on these duplicated rows). I'd like to think this method is also pretty scalable too!
library(tidyverse)
df %>%
modify_if(is.factor, as.character) %>%
mutate(n = ifelse(is.na(line), 2, 1)) %>%
uncount(n) %>%
mutate_at(vars(group), ~ifelse(is.na(line) & is.na(lag(line)), "Not Applicable", .))
# line group
# 1 1 1.0 Group A
# 2 2 2.0 Group B
# 3 NA 3.0 Group C
# 4 NA Not Applicable
# 5 4 4.0 Group D
# 6 5 5.0 Group E
# 7 NA 6.0 Group F
# 8 NA Not Applicable
# 9 7 7.0 Group G
Upvotes: 1
Reputation: 5297
Create a separate data.frame, ds_blank
, and then use a union query to stack, then order it by a temp variable called index
.
library(magrittr)
na_index <- which(is.na(df$line))
ds_blank <- tibble::tibble(
index = na_index + .5,
line = rep(NA_real_ , length(na_index)),
group = rep("Not Applicable" , length(na_index))
)
df <- df %>%
tibble::rowid_to_column("index") %>%
dplyr::union(ds_blank) %>%
dplyr::arrange(index) %>%
dplyr::select(-index)
> df
line group
1 1 1.0 Group A
2 2 2.0 Group B
3 NA 3.0 Group C
4 NA Not Applicable
5 4 4.0 Group D
6 5 5.0 Group E
7 NA 6.0 Group F
8 NA Not Applicable
9 7 7.0 Group G
I wanted to try tibble::add_row()
, but that apparently doesn't allow multiple rows to be inserted if you specify a position.
...using @Gregor's tip to use a for loop. Notice the na_index
is reverse sorted now.
na_index <- sort(which(is.na(df$line)), decreasing = T)
for( i in na_index ) {
df <- df %>%
tibble::add_row(
line = NA_integer_,
group = "Not Applicable",
.after = i
)
}
Upvotes: 3
Reputation: 20095
An solution could be achieved using dplyr
.
The approach is simple. Add a column representing row number
. Take out rows having line value as NA
. Replace group
with Not Applicable
and increment row number
column by 0.5. Bind those two data frames.
library(dplyr)
df %>% mutate(rownum = row_number()) %>%
bind_rows(., filter(., is.na(line)) %>%
mutate(group = "Not Applicable", rownum = rownum+.5)) %>%
arrange(rownum) %>%
select(-rownum)
# line group
# 1 1 1.0 Group A
# 2 2 2.0 Group B
# 3 NA 3.0 Group C
# 4 NA Not Applicable
# 5 4 4.0 Group D
# 6 5 5.0 Group E
# 7 NA 6.0 Group F
# 8 NA Not Applicable
# 9 7 7.0 Group G
Limitations mentioned by @Gregor are valid. Numeric column can have value as NA
but not blank
.
Upvotes: 3
Reputation: 146224
Here's a base R method: split the data by the cumulative NA count, add in the new lines, recombine.
df$group = as.character(df$group)
split_df = split(df, cumsum(is.na(df$line)))
split_df[-1] = lapply(split_df[-1], function(d) rbind(d[1, ], data.frame(line = NA, group = "Not applicable"), d[-1, ]))
do.call(rbind, split_df)
# line group
# 0.1 1 1.0 Group A
# 0.2 2 2.0 Group B
# 1.3 NA 3.0 Group C
# 1.1 NA Not applicable
# 1.4 4 4.0 Group D
# 1.5 5 5.0 Group E
# 2.6 NA 6.0 Group F
# 2.1 NA Not applicable
# 2.7 7 7.0 Group G
Note that I converted group
to character
to make adding new values easy, and I put NA
s in the line
column - you can't just have blanks in a numeric vector, every element needs to either be a number or NA
.
Upvotes: 4