Reputation: 3088
I am trying to change the value of one column based on the values of two other columns. It gives me a bit of a headache so far, and I am not sure if it's even possible.
My data set looks like this. One column is the time, and the other two columns reflect an offspring parent relationship. In weird occasions like in the time point 1 I have the offspring "D" which pops up for the first time in the data set, and it has not been there in the previous time point to serve both as an offspring and father at the same time.
Data
structure(list(time = c(0L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L),
offspring = c("A", "B", "C", "A", "D", "E", "A", "F", "G"
), parent = c(NA, NA, NA, "A", "B", "D", "A", "A", "F")), class = "data.frame", row.names = c(NA,
-9L))
what I would like help to is
to find all the offspring that exist in a one-time point but not the previous one (without taking into consideration time point 0) and serve as both offspring and father like the D and F
When I find them, I would like to reduce one exact time point by 0.5
time offspring parent
0 A NA
0 B NA
0 C NA
1 A A
0.5 D B
1 E D
2 A A
1.5 F A
2 G F
Any help or guidance in this issue would be highly appreciated.
Upvotes: 1
Views: 819
Reputation: 24079
Create 2 data frames finding the first appearances of each animal as both a parent and offspring.
Find the times and animals occurring in both combines columns and then update the times in the original dataframe.
df <-structure(list(time = c(0L, 0L, 0L, 1L, 1L, 1L, 2L, 2L, 2L),
offspring = c("A", "B", "C", "A", "D", "E", "A", "F", "G"),
parent = c(NA, NA, NA, "A", "B", "D", "A", "A", "F")), class = "data.frame",
row.names = c(NA, -9L))
library(dplyr)
#find the row where each Letter First appears as both a parent and offspring
parents <-df %>% filter(complete.cases(.)) %>% group_by(parent) %>% slice(1) %>% select(time, parent)
offsprings <- df %>% group_by(offspring) %>% slice(1) %>% select(time, offspring)
combined <- full_join(offsprings, parents)
#rows where the names match for both parent and offspring
matchingrows <-which(combined$parent == combined$offspring)
#update the times
for (i in matchingrows){
row = which(df$time == combined$time[i] & df$offspring == combined$offspring[i])
df$time[row] <- df$time[row] - 0.5
}
df
Upvotes: 1
Reputation: 1224
In data.table if you want:
library(data.table)
DT <- data.table(time = c(0,0,0,1,1,1,2,2,2),
offspring = c('A', 'B', 'C', 'A', 'D', 'E', 'A', 'F', 'G'),
parent = c(NA, NA, NA, 'A', 'B', 'D', 'A', 'A', 'F'))
for (i in seq_len(nrow(DT))) {
DT[i, time := fifelse(time != 0 & offspring %chin% DT[, parent] & !(offspring %chin% DT[seq_len(i-1), offspring]),
time - 0.5,
time)]
}
> DT
time offspring parent
1: 0.0 A <NA>
2: 0.0 B <NA>
3: 0.0 C <NA>
4: 1.0 A A
5: 0.5 D B
6: 1.0 E D
7: 2.0 A A
8: 1.5 F A
9: 2.0 G F
with dplyr:
library(dplyr)
library(tibble)
tbl <- tibble(time = c(0,0,0,1,1,1,2,2,2),
offspring = c('A', 'B', 'C', 'A', 'D', 'E', 'A', 'F', 'G'),
parent = c(NA, NA, NA, 'A', 'B', 'D', 'A', 'A', 'F'))
for (i in seq_len(nrow(tbl))) {
tbl[i,][['time']] <- tbl[i, ] %>% mutate(time = if_else(time != 0 &
offspring %in% tbl[['parent']] &
!(offspring %in% tbl[seq_len(i-1),][['offspring']]),
time - 0.5,
time)) %>% pull(time)
}
> tbl
# A tibble: 9 x 3
time offspring parent
<dbl> <chr> <chr>
1 0 A NA
2 0 B NA
3 0 C NA
4 1 A A
5 0.5 D B
6 1 E D
7 2 A A
8 1.5 F A
9 2 G F
Upvotes: 0
Reputation: 166
My solution probably isn't the most concise, but I was able to make it work and it is generalizable for a larger data set. I'm sure there are ways to improve on this, so I'm curious to see what others come up with. First I was having trouble with 0 subscripts so I added 2 two the time column which I subtracted back at the end.
The idea is that I iterate through the rows and I find the individuals that were offspring in the current year (after year 0) but not the previous year. Then I check which of those individuals were also parents in the current year. I compile the rows for which these individuals were offspring in that time period into a vector because we'll delete them later. I then make a new row with the time-.5, that offspring, and its parent. I compile those into a new data frame which will replace the rows being deleted.
Because there's duplication for each time stamp, I make the vector of rows to delete and df of rows to add unique. I then make the deletions and additions to the original data frame and get the data types to agree.
parent_offspring <- data.frame(
"time" = c( rep(0,3), rep(1,3), rep(2,3)),
"offspring" = c("A","B","C","A","D","E","A","F","G"),
"parent" = c(NA, NA, NA, "A","B","D","A","A","F")
)
po<- parent_offspring
po$time <- po$time+2
delete_vec <- vector()
df_to_add <- data.frame()
for (i in seq_along(po$time)) {
q <- po$time[[i]] # Value of "Time" variable for the row
a <- which(po$time == q) # Rows sharing that value of "Time"
offspring_curr <- po$offspring[a] # Offspring at that time
b <- which(po$time==(q-1)) # Rows of offspring at Time-1
offspring_prev <- po$offspring[b] # Identities of offspring at Time-1
f<- offspring_curr[offspring_curr %in% offspring_prev == FALSE] # Which offspring at Time were not offspring at Time-1
if (length(f) == 0) {
next
} else { ##skip ahead if none of the offspring at Time were not offspring at Time - 1
parents_curr <- po$parent[which(po$time == q)] # Parents at current time
parent_and_offpsring_curr <- intersect(f,parents_curr) # Which individuals are both parents and offspring at the current time
if (length(parent_and_offpsring_curr) == 0) {
next
} else { ## skip ahead if no individuals are both parents and offspring
g<- which(po$time==q & po$offspring==parent_and_offpsring_curr) # which offspring row is occupied by an individual who is both a parent and offspring at the current time
delete_vec <- append(delete_vec,g) #we'll be deleting those rows in the end so we'll keep track of them and save them in a vector
h<- po$parent[g] # this is the parent for the offspring/parent individual in the current time.
add_row<-c((q-.5), parent_and_offpsring_curr, h) # make a new row with the fractional time, parent/offspring individual, and their parent for row when the parent/offspring individual is an offspring
df_to_add <- rbind(df_to_add,add_row) ## we'll add these rows at the end
}
}
}
delete_vec<-unique(delete_vec) ## iteration gave us duplicates
df_to_add <- unique(df_to_add) ## same as above
colnames(df_to_add) <- colnames(po) ## fix column names for new df
po<- po[-delete_vec,] ## remove the offspring rows for the parent/offspring individuals
po<-rbind(po,df_to_add) ## add the rows with fractional times
rownames(po) <- c(1:nrow(po)) ## fix the row numbers
po$time<- as.numeric(po$time) ## time was converted to character when put into a vector with letters
po$time <- po$time-2 ## back to the original time values
po
time offspring parent
1 0.0 A <NA>
2 0.0 B <NA>
3 0.0 C <NA>
4 1.0 A A
5 1.0 E D
6 2.0 A A
7 2.0 G F
8 0.5 D B
9 1.5 F A
You can then arrange the rows to be in ascending time order with dplyr::arrange
Upvotes: 0