Reputation: 3594
I am working on the SAT Scores database: https://nycopendata.socrata.com/Education/SAT-Results/f9bf-2cp4?
This is what it looks like:
> head(SAT)
DBN SCHOOL.NAME Num.of.SAT.Test.Takers
1 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES 29
2 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 91
3 01M450 EAST SIDE COMMUNITY SCHOOL 70
4 01M458 FORSYTH SATELLITE ACADEMY 7
5 01M509 MARTA VALLE HIGH SCHOOL 44
6 01M515 LOWER EAST SIDE PREPARATORY HIGH SCHOOL 112
SAT.Critical.Reading.Avg..Score SAT.Math.Avg..Score SAT.Writing.Avg..Score
1 355 404 363
2 383 423 366
3 377 402 370
4 414 401 359
5 390 433 384
6 332 557 316
In the Column Num.of.SAT.Test.Takers, many values are simply the character 's'. The corresponding values for the 's' columns also have 's' and no numeric scores.
> SATnocandidates<-SAT[SAT$Num.of.SAT=='s', ]
> head(SATnocandidates)
DBN SCHOOL.NAME Num.of.SAT.Test.Takers
23 02M392 MANHATTAN BUSINESS ACADEMY s
24 02M393 BUSINESS OF SPORTS SCHOOL s
26 02M399 THE HIGH SCHOOL FOR LANGUAGE AND DIPLOMACY s
39 02M427 MANHATTAN ACADEMY FOR ARTS & LANGUAGE s
41 02M437 HUDSON HIGH SCHOOL OF LEARNING TECHNOLOGIES s
42 02M438 INTERNATIONAL HIGH SCHOOL AT UNION SQUARE s
SAT.Critical.Reading.Avg..Score SAT.Math.Avg..Score SAT.Writing.Avg..Score
23 s s s
24 s s s
26 s s s
39 s s s
41 s s s
42 s s s
Questions
Upvotes: 0
Views: 2948
Reputation: 193517
My comment as an answer...
Use the na.strings
argument to read your data in. Assuming you had downloaded the CSV version of the dataset to your "Downloads" directory, you would us a command like:
SAT <- read.csv("~/Downloads/SAT_Results.csv", na.strings = "s")
Note that the na.strings
argument is plural--you can have multiple values that get read in as NA
.
Another option, if the data is already in your R workspace, is to get rid of your "s" values just by coercion. The columns are likely to be factors or characters at the moment. If you convert them to numeric, the "s" values would automatically become NA
(you'll get warnings, but the warnings are only telling us what we already know).
So for instance, imagine we started here:
SAT <- read.csv("~/Downloads/SAT_Results.csv", na.strings = "s")
If we wanted to apply our operation across all numeric columns (all but the first two columns), we could do:
SAT[-c(1, 2)] <- lapply(SAT[-c(1, 2)], function(x) as.numeric(as.character(x)))
Alternatively, if you wanted to change just the third column, you can use something like the following:
SAT[[3]] <- as.numeric(as.character(SAT[[3]]))
Upvotes: 2