Zeineb
Zeineb

Reputation: 69

Select negative values from a Data Frame, using R

I have the following data.frame called best100_gene:

best100_gene

I want to select only the lines where Data_PCA$ind$coord[, 2] < 0. I tried the following command:

gene_neg = best100_gene[which("Data_PCA$ind$coord[, 2]" < 0, )]

But it doesn't work! I tried several other options but they did not work either.

Upvotes: 2

Views: 16212

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226172

It's rather hard to even create data like yours. We need check.names=FALSE if we want to create data frames with names containing $ and [, and back-ticks ` to protect the weird names when referring to them ...

 best100_gene <- data.frame(
    SYMBOL=c("A", "b", "c", "d", "e"),
    Data_PCA_contrib=c(.26,.25,.36,.11,.35),
    `Data_PCA$ind$coord[, 2]`=c(12,15,-11,-11,-11),check.names=FALSE)

This is the closest to what you wanted ...

 best100_gene[best100_gene[,"Data_PCA$ind$coord[, 2]"]<0,]

You can also use

 subset(best100_gene,`Data_PCA$ind$coord[, 2]`<0)

or

 with(best100_gene,best100_gene[`Data_PCA$ind$coord[, 2]`<0,])

or

 dplyr::filter(best100_gene,`Data_PCA$ind$coord[, 2]`<0)

It would be better to rename your column names to something easier to handle, e.g.

 bb <- dplyr::rename(best100_gene,dpc2=`Data_PCA$ind$coord[, 2]`)

Or, even better, look farther back in your workflow and see where the weird names came from.

Upvotes: 1

Hack-R
Hack-R

Reputation: 23214

best100_gene <- data.frame(
  SYMBOL=c("A", "b", "c", "d", "e"),
  Data_PCA_contrib=c(.26,.25,.36,.11,.35),
  "Data_PCA$ind$coord[, 2]"=c(12,15,-11,-11,-11)
)

Here's my example data based on your screenshot:

  SYMBOL Data_PCA_contrib Data_PCA.ind.coord...2.
1      A             0.26                      12
2      b             0.25                      15
3      c             0.36                     -11
4      d             0.11                     -11
5      e             0.35                     -11

Here's one way, which I highly recommend with the crazy column names:

best100_gene[best100_gene[3] < 0, ]
  SYMBOL Data_PCA_contrib Data_PCA.ind.coord...2.
3      c             0.36                     -11
4      d             0.11                     -11
5      e             0.35                     -11

Here's another way:

best100_gene[best100_gene$Data_PCA.ind.coord...2. < 0, ]
  SYMBOL Data_PCA_contrib Data_PCA.ind.coord...2.
3      c             0.36                     -11
4      d             0.11                     -11
5      e             0.35                     -11

Here's another way:

good_names             <- c("symbol", "pca_contrib", "pca_coord")
colnames(best100_gene) <- good_names
best100_gene[best100_gene$pca_coord<0, ]
  symbol pca_contrib pca_coord
3      c        0.36       -11
4      d        0.11       -11
5      e        0.35       -11

Upvotes: 1

Related Questions