Reputation: 33
I am working with a rather noisy data set and I was wondering if there was a good way to selectively choose between two rows of data within a group or leave them alone. Logic-wise I want to filter by group and then build an if-else type control structure to compare rows based on the value of a second column.
Example:
Row ID V1 V2
1 1 blah 1.2
2 1 blah NA
3 2 foo 2.3
4 3 bar NA
5 3 bar NA
I want to group by ID (1, 2, 3) then go to column V2 and choose for example, row 1 over row 2 because row 2 has NA. But for rows 4 and 5, where both are 'NA' I want to just leave them alone.
Thanks,
Upvotes: 2
Views: 54
Reputation: 39154
A solution using purrr
. The idea is to split
the data frame by ID
. After that, apply a user-defineed function, which evaluates if all the elements in V2
are NA
. If TRUE
, returns the original data frame, otherwise returns the subset of the data frame by filtering out rows with NA
using na.omit
. map_dfr
is similar to lapply
, but it can combine all the data frames in a list automatically. dt2
is the final output.
library(purrr)
dt2 <- dt %>%
split(.$ID) %>%
map_dfr(function(x){
if(all(is.na(x$V2))){
return(x)
} else {
return(na.omit(x))
}
})
dt2
# Row ID V1 V2
# 1 1 1 blah 1.2
# 2 3 2 foo 2.3
# 3 4 3 bar NA
# 4 5 3 bar NA
DATA
dt <- read.table(text = "Row ID V1 V2
1 1 blah 1.2
2 1 blah NA
3 2 foo 2.3
4 3 bar NA
5 3 bar NA",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 0
Reputation: 7063
What you need might really depends on what you exactly have. In case of NAs, this might help:
df <- data.frame(
Row = c(1, 2, 3, 4, 5),
ID = c(1, 1, 2, 3, 3),
V1 = c("bla", "bla", "foo", "bla", "bla"),
V2 = c(1.2, NA, 2.3, NA, NA),
stringsAsFactors = FALSE)
df <- df[complete.cases(df), ]
Upvotes: 1