MMYang
MMYang

Reputation: 33

Selecting between rows in a dataframe

I am working with a rather noisy data set and I was wondering if there was a good way to selectively choose between two rows of data within a group or leave them alone. Logic-wise I want to filter by group and then build an if-else type control structure to compare rows based on the value of a second column.

Example:

Row ID  V1  V2  
1   1  blah 1.2 
2   1  blah NA 
3   2  foo  2.3 
4   3  bar  NA
5   3  bar  NA

I want to group by ID (1, 2, 3) then go to column V2 and choose for example, row 1 over row 2 because row 2 has NA. But for rows 4 and 5, where both are 'NA' I want to just leave them alone.

Thanks,

Upvotes: 2

Views: 54

Answers (2)

www
www

Reputation: 39154

A solution using purrr. The idea is to split the data frame by ID. After that, apply a user-defineed function, which evaluates if all the elements in V2 are NA. If TRUE, returns the original data frame, otherwise returns the subset of the data frame by filtering out rows with NA using na.omit. map_dfr is similar to lapply, but it can combine all the data frames in a list automatically. dt2 is the final output.

library(purrr)

dt2 <- dt %>%
  split(.$ID) %>%
  map_dfr(function(x){
    if(all(is.na(x$V2))){
      return(x)
    } else {
      return(na.omit(x))
    }
  })
dt2
#   Row ID   V1  V2
# 1   1  1 blah 1.2
# 2   3  2  foo 2.3
# 3   4  3  bar  NA
# 4   5  3  bar  NA

DATA

dt <- read.table(text = "Row ID  V1  V2  
1   1  blah 1.2 
2   1  blah NA 
3   2  foo  2.3 
4   3  bar  NA
5   3  bar  NA",
                 header = TRUE, stringsAsFactors = FALSE)

Upvotes: 0

Christoph
Christoph

Reputation: 7063

What you need might really depends on what you exactly have. In case of NAs, this might help:

df <- data.frame(
  Row = c(1, 2, 3, 4, 5),
  ID =  c(1, 1, 2, 3, 3),
  V1 = c("bla", "bla", "foo", "bla", "bla"),
  V2 = c(1.2, NA, 2.3, NA, NA), 
  stringsAsFactors = FALSE)
df <- df[complete.cases(df), ]

Upvotes: 1

Related Questions