Reputation: 328
I have a very large dataset, and am trying to filter out specific rows in my dataset. Here is a link to the public dataset. This dataset can be downloaded for example purposes.
Here is the code that I used:
library(readxl)
library(dplyr)
library(tidyverse)
#Set the working directory
setwd('path\\to\\file.xlsx')
data <- read_excel("dataset_for_stack_question.xlsx")
unique(data$country_of_interest)
Using the list provided from the unique
command, I tried to filter all the regions that I didn't want using the filter
command:
filtered <- data %>%
filter(country_of_interest != c("Middle Africa",
"Eastern Africa",
"Western Africa",
"Northern Africa",
"Western Asia",
"Central Asia",
"Southern Asia",
"Eastern Asia",
"Central America",
"Australia / New Zealand",
"Eastern Europe",
"Northern Europe",
"Southern Europe",
"Western Europe",
"More developed regions",
"Less developed regions",
"Least developed countries",
"Less developed regions, excluding least developed countries",
"High-income countries",
"Middle-income countries",
"Upper-middle-income countries",
"Lower-middle-income countries",
"Low-income countries",
"No income group available",
"Africa",
"Asia",
"Europe",
"Latin America and the Caribbean",
"Northern America",
"Oceania"))
However, when I run the head(filtered, 20)
command, I see that there are still some columns that were not filtered:
#Output of the head(filtered, 20) command:
# A tibble: 20 x 4
Year country_of_interest country migration
<chr> <chr> <chr> <dbl>
1 1990 Eastern Africa Afghanistan 0
2 1990 Eastern Africa American Samoa 0
3 1990 Eastern Africa Andorra 0
4 1990 Eastern Africa Angola 139108
5 1990 Eastern Africa Anguilla 0
6 1990 Eastern Africa Antigua and Barbuda 0
7 1990 Eastern Africa Argentina 0
8 1990 Eastern Africa Armenia 0
9 1990 Eastern Africa Aruba 0
10 1990 Eastern Africa Australia 148
11 1990 Eastern Africa Austria 0
12 1990 Eastern Africa Azerbaijan 0
13 1990 Eastern Africa Bahamas 0
14 1990 Eastern Africa Bahrain 0
15 1990 Eastern Africa Bangladesh 131
16 1990 Eastern Africa Barbados 0
17 1990 Eastern Africa Belarus 0
18 1990 Eastern Africa Belgium 794
19 1990 Eastern Africa Belize 0
20 1990 Eastern Africa Benin 0
As per the previous filter
code, "Eastern Africa" should've been filtered. Additionally, there were other criteria that were supposed to be filtered, that were not. How can I ensure that all of the data is filtered, like it should be?
Upvotes: 0
Views: 85
Reputation: 328
As stated by @akrun, the solution was to use the !country_of_interest %in%
format, as below:
filtered <- data %>%
filter(!country_of_interest %in% c("Middle Africa",
"Eastern Africa",
"Western Africa",
"Northern Africa",
"Western Asia",
"Central Asia",
"Southern Asia",
"Eastern Asia",
"Central America",
"Australia / New Zealand",
"Eastern Europe",
"Northern Europe",
"Southern Europe",
"Western Europe",
"More developed regions",
"Less developed regions",
"Least developed countries",
"Less developed regions, excluding least developed countries",
"High-income countries",
"Middle-income countries",
"Upper-middle-income countries",
"Lower-middle-income countries",
"Low-income countries",
"No income group available",
"Africa",
"Asia",
"Europe",
"Latin America and the Caribbean",
"Northern America",
"Oceania"))
Upvotes: 1