Carl Rynegardh
Carl Rynegardh

Reputation: 558

R - Removing rows in data frame by list of column values

I have two data frames, one containing the predictors and one containing the different categories I want to predict. Both of the data frames contain a column named geoid. Some of the rows of my predictors contains NA values, and I need to remove these. After extracting the geoid value of the rows containing NA values, and removing them from the predictors data frame I need to remove the corresponding rows from the categories data frame as well. It seems like a rather basic operation but the code won't work.

categories <- as.data.frame(read.csv("files/cat_df.csv"))
predictors <- as.data.frame(read.csv("files/radius_100.csv"))
NA_rows <- predictors[!complete.cases(predictors),]
geoids <- NA_rows['geoid']
clean_categories <- categories[!(categories$geoid %in% geoids),]

None of the rows in categories/clean_categories are removed.

A typical geoid value is US06140231. typeof(categories$geoid) returns integer.

Upvotes: 0

Views: 80

Answers (1)

doctorG
doctorG

Reputation: 1731

I can't say this is it, but a very basic typo won't be doing what you want, try this correction

clean_categories <- categories[!(categories$geoid %in% geoids),]

Almost certainly this is what you meant to happen in that line. You want to negate the result of the %in% operator. You don't include a reproducible example so I can't say whether the whole thing will do as you want.

Upvotes: 1

Related Questions