star
star

Reputation: 775

Give value to specific rows in data.frame in R

I have a table like this

                    names                   ranges          strand
1                               [      1, 3073252]               +
2        ENSMUSG00000102693     [3073253, 3074322]               +
3                               [3074323, 3102015]               +
4        ENSMUSG00000102693     [3102016, 3102125]               +
5                               [3102126, 3252756]               + 
6        ENSMUSG00000095366     [90667525, 90667625]             -
7                               [90667626, 90754512]             -
8        ENSMUSG00000095366     [90754513, 90754821]             -
9                               [90754822, 90838868]             -
10       ENSMUSG00000096850     [90838869, 90839177]             -

But just some rows has "names". I want to give a value in "names" columns like below: If "names" in 2 rows (2 and 4) are the same then the middle row (3) take that name with "new":

for example:

                    names                   ranges          strand
1                               [      1, 3073252]               +
2        ENSMUSG00000102693     [3073253, 3074322]               +
3        ENSMUSG00000102693_new [3074323, 3102015]               +
4        ENSMUSG00000102693     [3102016, 3102125]               +
5                               [3102126, 3252756]               + 
6        ENSMUSG00000095366     [90667525, 90667625]             -
7        ENSMUSG00000095366_new [90667626, 90754512]             -
8        ENSMUSG00000095366     [90754513, 90754821]             -
9                               [90754822, 90838868]             -
10       ENSMUSG00000096850     [90838869, 90839177]             -

Thanks.

Upvotes: 0

Views: 243

Answers (2)

ytk
ytk

Reputation: 2827

Another possible solution using lead and lag:

library(dplyr)
names <- c('', 'ENSMUSG00000102693', '', 'ENSMUSG00000102693', '', 'ENSMUSG00000095366', '', 'ENSMUSG00000095366', '', 'ENSMUSG00000096850')
df <- data.frame(names)
df$names <- as.character(df$names)
df$names <- ifelse((lag(df$names, default = '1') == lead(df$names, default = '2')) & (lag(df$names, default = '1') != ''), paste0(lag(df$names), '_new'), df$names)
##                    names
##1                        
##2      ENSMUSG00000102693
##3  ENSMUSG00000102693_new
##4      ENSMUSG00000102693
##5                        
##6      ENSMUSG00000095366
##7  ENSMUSG00000095366_new
##8      ENSMUSG00000095366
##9                        
##10     ENSMUSG00000096850

For each entry, it checks if the previous value and the next value are the same, and they are not empty strings. If the conditions are satisfied, it will copy the previous value and add _new to it.

Upvotes: 2

Roland
Roland

Reputation: 132696

na.locf is a possibility here:

x <- c("a", NA, "a", NA, "b")
library(zoo)

fun <- function(x) {
  y <- na.locf(x) #last observation carried forward
  z <- na.locf(x, fromLast = TRUE) #last observation carried backward
  x[y == z] <- y[y == z]
  x
}

x1 <- fun(x)
#[1] "a" "a" "a" NA  "b"
x1[is.na(x) & !is.na(x1)] <- paste0(x1[is.na(x) & !is.na(x1)], "_new")
#[1] "a"     "a_new" "a"     NA      "b" 

Upvotes: 1

Related Questions