Matt
Matt

Reputation: 73

How to hide rows with specific data in R?

I have a dataframe with several rows. I want to select some rows with specific rownames (only 1 and 0) and avoid rows with the values N and X (as highlighted in the picture). This dataframe is as follows:

[Picture]

The result would be as the following:

The reason for this is due to the fact that I'd like to use plot(TONICIDADE, VD) without including rows containing N and X in the plot. I wouldn't like to delete the rows with N and X, just wouldn't like them to be shown when plotting.

Upvotes: 1

Views: 2368

Answers (1)

Claus Wilke
Claus Wilke

Reputation: 17790

General thoughts on data manipulation

You're approaching this from a spreadsheet mindset, where data manipulation always messes up your original dataset and where it's expensive and cumbersome to make copies. In R, we don't hide parts of a data frame, we make copies that have only the parts (or modifications) we're interested in.

I don't have your dataset, so I'll use iris:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

There are many ways to remove rows. I prefer filter() from the dplyr package. For example, to remove cases with Sepal.Length >= 5, I could enter:

> iris2 <- filter(iris, Sepal.Length < 5)
> head(iris2)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          4.9         3.0          1.4         0.2  setosa
2          4.7         3.2          1.3         0.2  setosa
3          4.6         3.1          1.5         0.2  setosa
4          4.6         3.4          1.4         0.3  setosa
5          4.4         2.9          1.4         0.2  setosa
6          4.9         3.1          1.5         0.1  setosa

(You tell filter() what you want to keep, not what you want to remove.)

It's also possible to use the pipe operator %>% to feed the modified data frame right into the next function. You do this if you want to filter only once and use the result right away. So, if I wanted to filter and then plot, I could do:

filter(iris, Sepal.Length < 5) %>% 
  ggplot(aes(x = Sepal.Width, y = Sepal.Length)) + geom_point()

enter image description here

Unfortunately the base-R plot() function doesn't take data frames as input, so the pipe approach requires that you plot with ggplot().

Solving your specific problem

To your specific problem: To filter out a specific list of values, you can generally use the !variable %in% c(...) pattern, where variable is the variable you want to filter on and c(...) is the vector of things you want to exclude, e.g.:

filter(data, !VD %in% c('N', 'X'))

And you use the same pattern without the ! to list the values you want to include rather than exclude.

Examples:

> data <- data.frame(VD = c("1", "0", "X", "N", "1"), values = rnorm(5))
> data
  VD      values
1  1 -0.56295856
2  0  0.36063581
3  X  0.06490702
4  N -0.23342063
5  1 -0.18901558
> filter(data, !VD %in% c('N', 'X'))
  VD     values
1  1 -0.5629586
2  0  0.3606358
3  1 -0.1890156
> filter(data, VD %in% c('0', '1'))
  VD     values
1  1 -0.5629586
2  0  0.3606358
3  1 -0.1890156

Upvotes: 1

Related Questions