Reputation: 73
I have a dataframe with several rows. I want to select some rows with specific rownames (only 1 and 0) and avoid rows with the values N and X (as highlighted in the picture). This dataframe is as follows:
[Picture]
The result would be as the following:
The reason for this is due to the fact that I'd like to use plot(TONICIDADE, VD)
without including rows containing N
and X
in the plot. I wouldn't like to delete the rows with N
and X
, just wouldn't like them to be shown when plotting.
Upvotes: 1
Views: 2368
Reputation: 17790
You're approaching this from a spreadsheet mindset, where data manipulation always messes up your original dataset and where it's expensive and cumbersome to make copies. In R, we don't hide parts of a data frame, we make copies that have only the parts (or modifications) we're interested in.
I don't have your dataset, so I'll use iris
:
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
There are many ways to remove rows. I prefer filter()
from the dplyr package. For example, to remove cases with Sepal.Length >= 5
, I could enter:
> iris2 <- filter(iris, Sepal.Length < 5)
> head(iris2)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 4.6 3.4 1.4 0.3 setosa
5 4.4 2.9 1.4 0.2 setosa
6 4.9 3.1 1.5 0.1 setosa
(You tell filter()
what you want to keep, not what you want to remove.)
It's also possible to use the pipe operator %>%
to feed the modified data frame right into the next function. You do this if you want to filter only once and use the result right away. So, if I wanted to filter and then plot, I could do:
filter(iris, Sepal.Length < 5) %>%
ggplot(aes(x = Sepal.Width, y = Sepal.Length)) + geom_point()
Unfortunately the base-R plot()
function doesn't take data frames as input, so the pipe approach requires that you plot with ggplot()
.
To your specific problem: To filter out a specific list of values, you can generally use the !variable %in% c(...)
pattern, where variable
is the variable you want to filter on and c(...)
is the vector of things you want to exclude, e.g.:
filter(data, !VD %in% c('N', 'X'))
And you use the same pattern without the !
to list the values you want to include rather than exclude.
Examples:
> data <- data.frame(VD = c("1", "0", "X", "N", "1"), values = rnorm(5))
> data
VD values
1 1 -0.56295856
2 0 0.36063581
3 X 0.06490702
4 N -0.23342063
5 1 -0.18901558
> filter(data, !VD %in% c('N', 'X'))
VD values
1 1 -0.5629586
2 0 0.3606358
3 1 -0.1890156
> filter(data, VD %in% c('0', '1'))
VD values
1 1 -0.5629586
2 0 0.3606358
3 1 -0.1890156
Upvotes: 1