pdubois
pdubois

Reputation: 7800

Why doesn't this remove row with zeros in a data frame?

I have the following data frame:

dat <- structure(list(V1 = structure(c(11L, 11L, 11L, 11L, 11L, 11L, 
11L, 11L, 11L, 11L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), .Label = c("XXX_LN_06.ID", 
"xxx_LN_06.ID", "aaa_LN_06.ID", "bbb_LN_06.ID", "ccc_LN_06.ID", 
"ddd_LN_06.ID", "eee_LN_06.ID", "fff_LN_06.ID", "ggg_LN_06.IN", 
"hhh_LN_06.ID", "iii_LN_06.ID", "jjj_LN_06.ID", "kkk_LN_06.ID", 
"lll_LN_06.ID", "mmm_LN_06.ID", "nnn_LN_06.ID", "ooo_LN_06.ID", 
"ppp_LN_06.ID", "qqq_IC_LN_06.ID", "rrr_LN_06.ID", "sss_LN_06.ID"
), class = "factor"), V2 = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("Bcells", 
"DendriticCells", "Macrophages", "Monocytes", "NKCells", "Neutrophils", 
"StemCells", "StromalCells", "abTcells", "gdTCells"), class = "factor"), 
    V3 = c(4474.2737, 5893.97307, 9414.21112, 5743.65136, 4100.84016, 
    7280.7078, 5317.92682, 11905.14762, 4697.03516, 4661.754, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0), V4 = c(1.501, 1.978, 3.159, 
    1.927, 1.376, 2.443, 1.785, 3.995, 1.576, 1.564, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0)), .Names = c("V1", "V2", "V3", "V4"), row.names = 191:210, class = "data.frame")

Displayed as:

> dat
              V1             V2        V3    V4
191 iii_LN_06.ID         Bcells  4474.274 1.501
192 iii_LN_06.ID DendriticCells  5893.973 1.978
193 iii_LN_06.ID    Macrophages  9414.211 3.159
194 iii_LN_06.ID      Monocytes  5743.651 1.927
195 iii_LN_06.ID        NKCells  4100.840 1.376
196 iii_LN_06.ID    Neutrophils  7280.708 2.443
197 iii_LN_06.ID      StemCells  5317.927 1.785
198 iii_LN_06.ID   StromalCells 11905.148 3.995
199 iii_LN_06.ID       abTcells  4697.035 1.576
200 iii_LN_06.ID       gdTCells  4661.754 1.564
201 ggg_LN_06.IN         Bcells     0.000 0.000
202 ggg_LN_06.IN DendriticCells     0.000 0.000
203 ggg_LN_06.IN    Macrophages     0.000 0.000
204 ggg_LN_06.IN      Monocytes     0.000 0.000
205 ggg_LN_06.IN        NKCells     0.000 0.000
206 ggg_LN_06.IN    Neutrophils     0.000 0.000
207 ggg_LN_06.IN      StemCells     0.000 0.000
208 ggg_LN_06.IN   StromalCells     0.000 0.000
209 ggg_LN_06.IN       abTcells     0.000 0.000
210 ggg_LN_06.IN       gdTCells     0.000 0.000

What I want to do is to remove with zeros. Yielding

> dat
              V1             V2        V3    V4
191 iii_LN_06.ID         Bcells  4474.274 1.501
192 iii_LN_06.ID DendriticCells  5893.973 1.978
193 iii_LN_06.ID    Macrophages  9414.211 3.159
194 iii_LN_06.ID      Monocytes  5743.651 1.927
195 iii_LN_06.ID        NKCells  4100.840 1.376
196 iii_LN_06.ID    Neutrophils  7280.708 2.443
197 iii_LN_06.ID      StemCells  5317.927 1.785
198 iii_LN_06.ID   StromalCells 11905.148 3.995
199 iii_LN_06.ID       abTcells  4697.035 1.576
200 iii_LN_06.ID       gdTCells  4661.754 1.564

Why this doesn't do the job?

row_sub = apply(dat, 1, function(row) any(row ==0 ))
dat[row_sub,]

Upvotes: 1

Views: 100

Answers (2)

LyzandeR
LyzandeR

Reputation: 37879

You could try this:

a <- which(dat==0, arr.ind=T)

dat[-a[,1],]

Or as per @David's comment below:

dat[rowSums(dat == 0L) == 0L, ]

Or:

dat[!rowSums(dat == 0L), ]

Output:

> dat[-a[,1],]
              V1             V2        V3    V4
191 iii_LN_06.ID         Bcells  4474.274 1.501
192 iii_LN_06.ID DendriticCells  5893.973 1.978
193 iii_LN_06.ID    Macrophages  9414.211 3.159
194 iii_LN_06.ID      Monocytes  5743.651 1.927
195 iii_LN_06.ID        NKCells  4100.840 1.376
196 iii_LN_06.ID    Neutrophils  7280.708 2.443
197 iii_LN_06.ID      StemCells  5317.927 1.785
198 iii_LN_06.ID   StromalCells 11905.148 3.995
199 iii_LN_06.ID       abTcells  4697.035 1.576
200 iii_LN_06.ID       gdTCells  4661.754 1.564

Problem in your case:

In your case row_sub is a vector only of FALSEs so it won't return any rows. Rows are returned where the vector is TRUE.

> row_sub
  191   192   193   194   195   196   197   198   199   200   201   202   203 
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
  204   205   206   207   208   209   210 
FALSE FALSE FALSE FALSE FALSE FALSE FALSE 

Upvotes: 4

LauriK
LauriK

Reputation: 1929

Because apply converts data to character first. You can (and should) debug these things first, like this:

apply(dat, 1, function(row) { print(str(row)) } )

And part of the output is this:

NULL
 Named chr [1:4] "ggg_LN_06.IN" "StromalCells" "    0.000" "0.000"
 - attr(*, "names")= chr [1:4] "V1" "V2" "V3" "V4"

Where you can easily see that it's all characters.

Upvotes: 4

Related Questions