babybonobo
babybonobo

Reputation: 89

Keeping Only Negative Values Across Multiple Columns in Data Frame

I have a large data frame with observations that are positive and negative. I want to only keep rows that have at least one negative value in one of the columns (since that indicates something is wrong with the data and needs to be cleaned). An example of the data frame is below.

Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered          State
1                0                   1            0               0               0                   0        Alabama
2                0                   0            0               0               0                   0         Alaska
3                0                   0            0               1               0                   0        Arizona
4                0                 -61            0               0               0                   0       Arkansas
5               21                  41            0               0               0                   9     California
6                0                  15            0               0              -1                   1       Colorado
7                2                  15            0               0               1                  21 

So I would want to keep an observation like Arkansas, which contains a negative value of -61 in one of its columns, but remove one like California, where none of the variables are negative.

Anyone know of a good way to do this? Thanks for the help!

Upvotes: 0

Views: 1481

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

You can use rowSums to count number of values that are less than 0 and select rows where the value is atleast more than 1.

cols <- sapply(df, is.numeric)
df[rowSums(df[cols] < 0) > 0, ]


#  Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered    State
#4               0                 -61            0               0               0                   0 Arkansas
#6               0                  15            0               0              -1                   1 Colorado

In dplyr, you can use filter_if which has been super-seeded but I still find it useful and simple to use in row-wise operations.

library(dplyr)
df %>% filter_if(cols, any_vars(. < 0))

Upvotes: 2

Andrew
Andrew

Reputation: 5138

Another base solution to find rowwise negative values: you can do.call() with pmax() (which flags values smaller than 0 in the list produced by lapply()). This should scale reasonably well for a large dataset, too.

df1[as.logical(do.call(pmax, lapply(df1[sapply(df1, is.numeric)], `<`, 0))), ]

  Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered    State
4               0                 -61            0               0               0                   0 Arkansas
6               0                  15            0               0              -1                   1 Colorado

Data:

df1 <- read.table(header = TRUE, text = "Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered          State
1                0                   1            0               0               0                   0        Alabama
2                0                   0            0               0               0                   0         Alaska
3                0                   0            0               1               0                   0        Arizona
4                0                 -61            0               0               0                   0       Arkansas
5               21                  41            0               0               0                   9     California
6                0                  15            0               0              -1                   1       Colorado
7                2                  15            0               0               1                  21           Misc")

Upvotes: 0

Duck
Duck

Reputation: 39595

Try this base R solution:

#Create index
index <- which(names(df)=='State')
#Data
df$Var <- apply(df[,-index],1,function (x) length(which(x<0)))
#Filter
df2 <- df[df$Var!=0,]

Staff.Confirmed Residents.Confirmed Staff.Deaths Resident.Deaths Staff.Recovered Residents.Recovered    State
4               0                 -61            0               0               0                   0 Arkansas
6               0                  15            0               0              -1                   1 Colorado

Upvotes: 1

Related Questions