Connie
Connie

Reputation: 89

R: how to add rows based on the value in a column

I have a data frame that looks like this:

line = c(1, 2, NA, 4 ,5, NA, 7)
group = c("1.0 Group A", "2.0 Group B", "3.0 Group C", "4.0 Group D", "5.0  Group E", "6.0 Group F", "7.0 Group G")
df <- data.frame(line, group)

view(df)
   line    group
1    1   1.0 Group A
2    2   2.0 Group B
3   NA   3.0 Group C
4    4   4.0 Group D
5    5   5.0 Group E
6   NA   6.0 Group F
7    7   7.0 Group G

What I want to do is to find all the NA value in the "line" column and place a row underneath that row in "group" column saying "Not Applicable". So that the new data frame should look like:

view(df)
    line    group
1    1   1.0 Group A
2    2   2.0 Group B
3   NA   3.0 Group C
4   NA   Not Applicable
5    4   4.0 Group D
6    5   5.0 Group E
7   NA   6.0 Group F
8   NA  Not Applicable
9    7   7.0 Group G

I am thinking about using an ifelse statement or using case_when from dplyr. But I don't know how to work it out. Does anyone have any suggestion?

Thank you!

Upvotes: 1

Views: 11615

Answers (4)

jackbdolg
jackbdolg

Reputation: 115

I feel like the tidyr::uncount function might also be what you're looking for. Just mark line==NA rows with a 2 in a new column, which we'll call n, and then uncount will duplicate each row based on the value in n. By mutating an ifelse that sets NA rows n == 2, we essentially only duplicate the NA rows directly below where they were, as opposed to at the bottom of the df and thereby needing to arrange. Finally, the dplyr::mutate_at just says to set group equal to "Not Applicable" if the line and the previous row's line (i.e. lag(line)) are both NA (which tells us to focus just on these duplicated rows). I'd like to think this method is also pretty scalable too!

library(tidyverse)
df %>%
    modify_if(is.factor, as.character) %>%
    mutate(n = ifelse(is.na(line), 2, 1)) %>%
    uncount(n) %>%
    mutate_at(vars(group), ~ifelse(is.na(line) & is.na(lag(line)), "Not Applicable", .))
#   line          group
# 1    1    1.0 Group A
# 2    2    2.0 Group B
# 3   NA    3.0 Group C
# 4   NA Not Applicable
# 5    4    4.0 Group D
# 6    5   5.0  Group E
# 7   NA    6.0 Group F
# 8   NA Not Applicable
# 9    7    7.0 Group G

Upvotes: 1

wibeasley
wibeasley

Reputation: 5297

Create a separate data.frame, ds_blank, and then use a union query to stack, then order it by a temp variable called index.

library(magrittr)
na_index <- which(is.na(df$line))

ds_blank <- tibble::tibble(
  index   = na_index + .5,
  line    = rep(NA_real_          , length(na_index)),
  group   = rep("Not Applicable"  , length(na_index))
)

df <- df %>% 
  tibble::rowid_to_column("index") %>% 
  dplyr::union(ds_blank) %>% 
  dplyr::arrange(index) %>% 
  dplyr::select(-index)

Result

> df
  line          group
1    1    1.0 Group A
2    2    2.0 Group B
3   NA    3.0 Group C
4   NA Not Applicable
5    4    4.0 Group D
6    5   5.0  Group E
7   NA    6.0 Group F
8   NA Not Applicable
9    7    7.0 Group G

I wanted to try tibble::add_row(), but that apparently doesn't allow multiple rows to be inserted if you specify a position.

Secondary approach

...using @Gregor's tip to use a for loop. Notice the na_index is reverse sorted now.

na_index <- sort(which(is.na(df$line)), decreasing = T)
for( i in na_index ) {
  df <- df %>% 
    tibble::add_row(
      line    = NA_integer_,
      group   = "Not Applicable",
      .after  = i
    )
}

Upvotes: 3

MKR
MKR

Reputation: 20095

An solution could be achieved using dplyr.

The approach is simple. Add a column representing row number. Take out rows having line value as NA. Replace group with Not Applicable and increment row number column by 0.5. Bind those two data frames.

library(dplyr)

df %>% mutate(rownum = row_number()) %>% 
  bind_rows(., filter(., is.na(line)) %>% 
  mutate(group = "Not Applicable", rownum = rownum+.5)) %>% 
  arrange(rownum) %>%
  select(-rownum)


# line          group
# 1    1    1.0 Group A
# 2    2    2.0 Group B
# 3   NA    3.0 Group C
# 4   NA Not Applicable
# 5    4    4.0 Group D
# 6    5   5.0  Group E
# 7   NA    6.0 Group F
# 8   NA Not Applicable
# 9    7    7.0 Group G

Limitations mentioned by @Gregor are valid. Numeric column can have value as NA but not blank.

Upvotes: 3

Gregor Thomas
Gregor Thomas

Reputation: 146224

Here's a base R method: split the data by the cumulative NA count, add in the new lines, recombine.

    df$group = as.character(df$group)
    split_df = split(df, cumsum(is.na(df$line)))
    split_df[-1] = lapply(split_df[-1], function(d) rbind(d[1, ], data.frame(line = NA, group = "Not applicable"), d[-1, ]))
    do.call(rbind, split_df)
    #     line          group
    # 0.1    1    1.0 Group A
    # 0.2    2    2.0 Group B
    # 1.3   NA    3.0 Group C
    # 1.1   NA Not applicable
    # 1.4    4    4.0 Group D
    # 1.5    5   5.0  Group E
    # 2.6   NA    6.0 Group F
    # 2.1   NA Not applicable
    # 2.7    7    7.0 Group G

Note that I converted group to character to make adding new values easy, and I put NAs in the line column - you can't just have blanks in a numeric vector, every element needs to either be a number or NA.

Upvotes: 4

Related Questions