Reputation: 89
I have a large data frame with observations that are positive and negative. I want to only keep rows that have at least one negative value in one of the columns (since that indicates something is wrong with the data and needs to be cleaned). An example of the data frame is below.
Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered State
1 0 1 0 0 0 0 Alabama
2 0 0 0 0 0 0 Alaska
3 0 0 0 1 0 0 Arizona
4 0 -61 0 0 0 0 Arkansas
5 21 41 0 0 0 9 California
6 0 15 0 0 -1 1 Colorado
7 2 15 0 0 1 21
So I would want to keep an observation like Arkansas, which contains a negative value of -61 in one of its columns, but remove one like California, where none of the variables are negative.
Anyone know of a good way to do this? Thanks for the help!
Upvotes: 0
Views: 1481
Reputation: 388982
You can use rowSums
to count number of values that are less than 0 and select rows where the value is atleast more than 1.
cols <- sapply(df, is.numeric)
df[rowSums(df[cols] < 0) > 0, ]
# Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered State
#4 0 -61 0 0 0 0 Arkansas
#6 0 15 0 0 -1 1 Colorado
In dplyr
, you can use filter_if
which has been super-seeded but I still find it useful and simple to use in row-wise operations.
library(dplyr)
df %>% filter_if(cols, any_vars(. < 0))
Upvotes: 2
Reputation: 5138
Another base solution to find rowwise negative values: you can do.call()
with pmax()
(which flags values smaller than 0
in the list produced by lapply()
). This should scale reasonably well for a large dataset, too.
df1[as.logical(do.call(pmax, lapply(df1[sapply(df1, is.numeric)], `<`, 0))), ]
Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered State
4 0 -61 0 0 0 0 Arkansas
6 0 15 0 0 -1 1 Colorado
Data:
df1 <- read.table(header = TRUE, text = "Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered State
1 0 1 0 0 0 0 Alabama
2 0 0 0 0 0 0 Alaska
3 0 0 0 1 0 0 Arizona
4 0 -61 0 0 0 0 Arkansas
5 21 41 0 0 0 9 California
6 0 15 0 0 -1 1 Colorado
7 2 15 0 0 1 21 Misc")
Upvotes: 0
Reputation: 39595
Try this base R
solution:
#Create index
index <- which(names(df)=='State')
#Data
df$Var <- apply(df[,-index],1,function (x) length(which(x<0)))
#Filter
df2 <- df[df$Var!=0,]
Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered State
4 0 -61 0 0 0 0 Arkansas
6 0 15 0 0 -1 1 Colorado
Upvotes: 1