Cina
Cina

Reputation: 10199

add row based on variable condition in R

I have df as follow

df
ID   type  other-col
1    A1      cc
1    A2      dd
1    A3      cc
2    A1      cc
2    B1      aa
3    A2      aa

I want add new to when "ID" changes with the value of F for "type" and "other-col" columns

new_df
ID
  df
    ID   type  other-col
    1    A1      cc
    1    A2      dd
    1    A3      cc
    1    F       F     <- this row added
    2    A1      cc
    2    B1      aa
    2    F       F      <- this row added
    3    A2      aa

how can I do it in R? thx

Upvotes: 5

Views: 1499

Answers (7)

www
www

Reputation: 39154

Here is another approach with a similar idea as @akrun's answer.

library(tidyverse)

dat2 <- dat %>% 
  split(f = .$ID) %>%
  map_if(.p = function(x) unique(x$ID) < max(dat$ID),
         ~bind_rows(.x, tibble(ID = unique(.x$ID), type = "F", `other.col` = "F"))) %>%
  bind_rows()

dat2
#   ID type other.col
# 1  1   A1        cc
# 2  1   A2        dd
# 3  1   A3        cc
# 4  1    F         F
# 5  2   A1        cc
# 6  2   B1        aa
# 7  2    F         F
# 8  3   A2        aa

Data

dat <- read.table(text = "ID   type  other-col
1    A1      cc
1    A2      dd
1    A3      cc
2    A1      cc
2    B1      aa
3    A2      aa",
                  header = TRUE, stringsAsFactors = FALSE)

Update

I provided an updated answer to show that if ID column is not integer but character, we can create a new column (ID2 in this case) that is converted to be factor based on ID, and then convert it to integer. The rest of the operation would be similar to the original answer but based on ID2.

library(tidyverse)

dat2 <- dat %>%
  mutate(ID2 = as.integer(factor(ID, levels = unique(.$ID)))) %>%
  split(f = .$ID2) %>%
  map_if(.p = function(x) unique(x$ID2) != unique(last(.)$ID2),
         ~bind_rows(.x, tibble(ID = unique(.x$ID), type = "F", `other.col` = "F",
                               ID2 = unique(.x$ID2)))) %>%
  bind_rows() %>%
  select(-ID2)
dat2
#   ID type other.col
# 1  C   A1        cc
# 2  C   A2        dd
# 3  C   A3        cc
# 4  C    F         F
# 5  A   A1        cc
# 6  A   B1        aa
# 7  A    F         F
# 8  B   A2        aa

DATA

dat <- read.table(text = "ID   type  other-col
C    A1      cc
C    A2      dd
C    A3      cc
A    A1      cc
A    B1      aa
B    A2      aa",
                  header = TRUE, stringsAsFactors = FALSE)

Upvotes: 3

chinsoon12
chinsoon12

Reputation: 25225

A possible approach using data.table:

library(data.table)
m <- setDT(df)[, max(ID)]
df[, if (.BY$ID < m) rbind(.SD, as.list(rep("F", ncol(.SD)))) else .SD, ID]

output:

   ID type other-col
1:  1   A1        cc
2:  1   A2        dd
3:  1   A3        cc
4:  1    F         F
5:  2   A1        cc
6:  2   B1        aa
7:  2    F         F
8:  3   A2        aa

or if you dont mind adding another row at the bottom, code will be shorter: setDT(df)[, rbind(.SD, as.list(rep("F", ncol(.SD)))), ID]

Upvotes: 1

Shree
Shree

Reputation: 11140

Similar to akrun's answer but in base R. Basically, split dataframe by ID then rbind extra row to each split, then recombine dataframe and remove unrequired last row using head(..., -1) -

head(n = -1,
  do.call(rbind,
    lapply(split(dat, dat$ID), function(x) {
      rbind(x, c(x$ID[1], "F", "F"))
    })
  )
)

    ID type other.col
1.1  1   A1        cc
1.2  1   A2        dd
1.3  1   A3        cc
1.4  1    F         F
2.4  2   A1        cc
2.5  2   B1        aa
2.3  2    F         F
3.6  3   A2        aa

Upvotes: 2

d.b
d.b

Reputation: 32538

inds = head(cumsum(with(rle(df$ID), unlist(lapply(lengths, function(i) c((rep(1, i)), F = 0))))), -1)
df1 = df[inds,]
df1[which(names(inds) == "F"), c("type", "other_col")] = "F"
df1
#    ID type other_col
#1    1   A1        cc
#2    1   A2        dd
#3    1   A3        cc
#3.1  1    F         F
#4    2   A1        cc
#5    2   B1        aa
#5.1  2    F         F
#6    3   A2        aa

Upvotes: 1

Onyambu
Onyambu

Reputation: 79188

Using base R you could do:

 cbind(ID=sort(c(dat$ID,unique(dat$ID))),do.call(rbind,by(dat[-1],dat[1],rbind,'F')))

    ID type other.col
1.1  1   A1        cc
1.2  1   A2        dd
1.3  1   A3        cc
1.4  1    F         F
2.4  2   A1        cc
2.5  2   B1        aa
2.3  2    F         F
3.6  3   A2        aa
3.2  3    F         F

Or you could do:

do.call(rbind,by(dat,dat$ID,function(x)cbind(ID = unique(x[,1]),rbind(x[-1],"F"))))

Upvotes: 1

thelatemail
thelatemail

Reputation: 93803

This should be doable in a single replacement operation once you know the indexes of where each change occurs. E.g.:

idx <- match(unique(df$ID), df$ID)[-1] - 1

df <- df[sort(c(sequence(nrow(df)),idx)),]
df[seq_along(idx) + idx, c("type","other_col")] <- "F"

#    ID type other_col
#1    1   A1        cc
#2    1   A2        dd
#3    1   A3        cc
#3.1  1    F         F
#4    2   A1        cc
#5    2   B1        aa
#5.1  2    F         F
#6    3   A2        aa

Where df was:

df <- read.table(text="ID   type  other_col
1    A1      cc
1    A2      dd
1    A3      cc
2    A1      cc
2    B1      aa
3    A2      aa", header=TRUE, stringsAsFactors=FALSE)

Upvotes: 4

akrun
akrun

Reputation: 886938

An option with group_split and add_row. We can split by 'ID' with group_split into a list of data.frames, then loop through the list with map, add a row as the last row (add_row - by default adds row to the end, but we can control it with .before and .after), then slice out the last row as the last 'ID' didn't need the 'F' row

library(tidyverse)
df1 %>%
   group_split(ID) %>% 
   map_dfr(~ .x %>% 
               add_row(ID = first(.$ID), type = 'F', `other-col` = 'F'))  %>%
   slice(-n())

Upvotes: 4

Related Questions