Reputation: 3088

Change the value of a column in R based on values of two other columns

I am trying to change the value of one column based on the values of two other columns. It gives me a bit of a headache so far, and I am not sure if it's even possible.

My data set looks like this. One column is the time, and the other two columns reflect an offspring parent relationship. In weird occasions like in the time point 1 I have the offspring "D" which pops up for the first time in the data set, and it has not been there in the previous time point to serve both as an offspring and father at the same time.

Data

structure(list(time = c(0L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L), 
    offspring = c("A", "B", "C", "A", "D", "E", "A", "F", "G"
    ), parent = c(NA, NA, NA, "A", "B", "D", "A", "A", "F")), class = "data.frame", row.names = c(NA, 
-9L))

what I would like help to is

to find all the offspring that exist in a one-time point but not the previous one (without taking into consideration time point 0) and serve as both offspring and father like the D and F
When I find them, I would like to reduce one exact time point by 0.5

time  offspring  parent
 0       A        NA
 0       B        NA
 0       C        NA
 1       A        A
 0.5     D        B 
 1       E        D
 2       A        A
 1.5     F        A
 2       G        F

Any help or guidance in this issue would be highly appreciated.

Upvotes: 1

Answers (3)

Dave2e

Reputation: 24079

Create 2 data frames finding the first appearances of each animal as both a parent and offspring.
Find the times and animals occurring in both combines columns and then update the times in the original dataframe.

df <-structure(list(time = c(0L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L), 
                    offspring = c("A", "B", "C", "A", "D", "E", "A", "F", "G"), 
                    parent = c(NA, NA, NA, "A", "B", "D", "A", "A", "F")), class = "data.frame", 
                    row.names = c(NA, -9L))


library(dplyr)
#find the row where each Letter First appears as both a parent and offspring 
parents <-df %>% filter(complete.cases(.)) %>% group_by(parent) %>% slice(1) %>% select(time, parent)
offsprings <- df  %>% group_by(offspring) %>% slice(1) %>% select(time, offspring)

combined <- full_join(offsprings, parents)
#rows where the names match for both parent and offspring
matchingrows <-which(combined$parent == combined$offspring)

#update the times
for (i in matchingrows){
   row = which(df$time == combined$time[i] & df$offspring == combined$offspring[i])
   df$time[row] <- df$time[row] - 0.5
}
df

Upvotes: 1

daniellga

Reputation: 1224

In data.table if you want:

library(data.table)

DT <- data.table(time = c(0,0,0,1,1,1,2,2,2),
                 offspring = c('A', 'B', 'C', 'A', 'D', 'E', 'A', 'F', 'G'),
                 parent = c(NA, NA, NA, 'A', 'B', 'D', 'A', 'A', 'F'))

for (i in seq_len(nrow(DT))) {
  DT[i, time := fifelse(time != 0 & offspring %chin% DT[, parent] & !(offspring %chin% DT[seq_len(i-1), offspring]),
                        time - 0.5,
                        time)]
}

> DT
   time offspring parent
1:  0.0         A   <NA>
2:  0.0         B   <NA>
3:  0.0         C   <NA>
4:  1.0         A      A
5:  0.5         D      B
6:  1.0         E      D
7:  2.0         A      A
8:  1.5         F      A
9:  2.0         G      F

with dplyr:

library(dplyr)
library(tibble)

tbl <- tibble(time = c(0,0,0,1,1,1,2,2,2),
              offspring = c('A', 'B', 'C', 'A', 'D', 'E', 'A', 'F', 'G'),
              parent = c(NA, NA, NA, 'A', 'B', 'D', 'A', 'A', 'F'))

for (i in seq_len(nrow(tbl))) {
  tbl[i,][['time']] <- tbl[i, ] %>% mutate(time = if_else(time != 0 &
                                         offspring %in% tbl[['parent']] &
                                         !(offspring %in% tbl[seq_len(i-1),][['offspring']]),
                                       time - 0.5,
                                       time)) %>% pull(time)
}

> tbl
# A tibble: 9 x 3
   time offspring parent
  <dbl> <chr>     <chr> 
1   0   A         NA    
2   0   B         NA    
3   0   C         NA    
4   1   A         A     
5   0.5 D         B     
6   1   E         D     
7   2   A         A     
8   1.5 F         A     
9   2   G         F

Upvotes: 0

RStudious

Reputation: 166

My solution probably isn't the most concise, but I was able to make it work and it is generalizable for a larger data set. I'm sure there are ways to improve on this, so I'm curious to see what others come up with. First I was having trouble with 0 subscripts so I added 2 two the time column which I subtracted back at the end.

The idea is that I iterate through the rows and I find the individuals that were offspring in the current year (after year 0) but not the previous year. Then I check which of those individuals were also parents in the current year. I compile the rows for which these individuals were offspring in that time period into a vector because we'll delete them later. I then make a new row with the time-.5, that offspring, and its parent. I compile those into a new data frame which will replace the rows being deleted.

Because there's duplication for each time stamp, I make the vector of rows to delete and df of rows to add unique. I then make the deletions and additions to the original data frame and get the data types to agree.

    parent_offspring <- data.frame(
                 "time" = c( rep(0,3), rep(1,3), rep(2,3)),
                 "offspring" = c("A","B","C","A","D","E","A","F","G"),
                 "parent" = c(NA, NA, NA, "A","B","D","A","A","F")
          )

    po<- parent_offspring
    po$time <- po$time+2
    delete_vec <- vector()
    df_to_add <- data.frame()


    for (i in seq_along(po$time)) {
    q <- po$time[[i]] # Value of "Time" variable for the row
    a <- which(po$time == q) # Rows sharing that value of "Time"
    offspring_curr <- po$offspring[a]  # Offspring at that time
    b <- which(po$time==(q-1))  # Rows of offspring at Time-1
    offspring_prev <- po$offspring[b] # Identities of offspring at Time-1
    f<- offspring_curr[offspring_curr %in% offspring_prev == FALSE] # Which offspring at Time were not offspring at Time-1

    if (length(f) == 0) {
    next
    } else { ##skip ahead if none of the offspring at Time were not offspring at Time - 1

    parents_curr <- po$parent[which(po$time == q)] # Parents at current time
    parent_and_offpsring_curr <- intersect(f,parents_curr) # Which individuals are both parents and offspring at the current time

    if (length(parent_and_offpsring_curr) == 0) {
    next
    } else { ## skip ahead if no individuals are both parents and offspring

    g<- which(po$time==q & po$offspring==parent_and_offpsring_curr) # which offspring row is occupied by an individual who is both a parent and offspring at the current time
    delete_vec <- append(delete_vec,g) #we'll be deleting those rows in the end so we'll keep track of them and save them in a vector
    h<- po$parent[g] # this is the parent for the offspring/parent individual in the current time. 
    add_row<-c((q-.5), parent_and_offpsring_curr, h) # make a new row with the fractional time, parent/offspring individual, and their parent for row when the parent/offspring individual is an offspring
    df_to_add <- rbind(df_to_add,add_row) ## we'll add these rows at the end
  }
}
 }
 
    delete_vec<-unique(delete_vec) ## iteration gave us duplicates
    df_to_add <- unique(df_to_add) ## same as above
    colnames(df_to_add) <- colnames(po) ## fix column names for new df
    po<- po[-delete_vec,] ## remove the offspring rows for the parent/offspring individuals
    po<-rbind(po,df_to_add) ## add the rows with fractional times
    rownames(po) <- c(1:nrow(po)) ## fix the row numbers
    po$time<- as.numeric(po$time) ## time was converted to character when put into a vector with letters
    po$time <- po$time-2 ## back to the original time values

    po

        time offspring parent
      1  0.0         A   <NA>
      2  0.0         B   <NA>
      3  0.0         C   <NA>
      4  1.0         A      A
      5  1.0         E      D
      6  2.0         A      A
      7  2.0         G      F
      8  0.5         D      B
      9  1.5         F      A

You can then arrange the rows to be in ascending time order with dplyr::arrange

Upvotes: 0

Change the value of a column in R based on values of two other columns

Answers (3)

Related Questions