vagabond
vagabond

Reputation: 3594

Replacing character values in Data Frame Column with numeric value

I am working on the SAT Scores database: https://nycopendata.socrata.com/Education/SAT-Results/f9bf-2cp4?

This is what it looks like:

> head(SAT)
 DBN                                   SCHOOL.NAME Num.of.SAT.Test.Takers
1 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES                     29
2 01M448           UNIVERSITY NEIGHBORHOOD HIGH SCHOOL                     91
3 01M450                    EAST SIDE COMMUNITY SCHOOL                     70
4 01M458                     FORSYTH SATELLITE ACADEMY                      7
5 01M509                       MARTA VALLE HIGH SCHOOL                     44
6 01M515       LOWER EAST SIDE PREPARATORY HIGH SCHOOL                    112
  SAT.Critical.Reading.Avg..Score SAT.Math.Avg..Score SAT.Writing.Avg..Score
1                             355                 404                    363
2                             383                 423                    366
3                             377                 402                    370
4                             414                 401                    359
5                             390                 433                    384
6                             332                 557                    316

In the Column Num.of.SAT.Test.Takers, many values are simply the character 's'. The corresponding values for the 's' columns also have 's' and no numeric scores.

> SATnocandidates<-SAT[SAT$Num.of.SAT=='s', ]
> head(SATnocandidates)
      DBN                                 SCHOOL.NAME Num.of.SAT.Test.Takers
23 02M392                  MANHATTAN BUSINESS ACADEMY                      s
24 02M393                   BUSINESS OF SPORTS SCHOOL                      s
26 02M399  THE HIGH SCHOOL FOR LANGUAGE AND DIPLOMACY                      s
39 02M427       MANHATTAN ACADEMY FOR ARTS & LANGUAGE                      s
41 02M437 HUDSON HIGH SCHOOL OF LEARNING TECHNOLOGIES                      s
42 02M438   INTERNATIONAL HIGH SCHOOL AT UNION SQUARE                      s
   SAT.Critical.Reading.Avg..Score SAT.Math.Avg..Score SAT.Writing.Avg..Score
23                               s                   s                      s
24                               s                   s                      s
26                               s                   s                      s
39                               s                   s                      s
41                               s                   s                      s
42                               s                   s                      s

Questions

  1. In the original SAT dataframe, I want to replace all 's' values in $Num.of.SAT column with numeric vector 0.
  2. Subsequently, I want to selectively replace all 's' values in corresponding columns to 0.
  3. How can I write an overarching command to find and replace all 's' values in the data frame to 0?

Upvotes: 0

Views: 2948

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

My comment as an answer...

Use the na.strings argument to read your data in. Assuming you had downloaded the CSV version of the dataset to your "Downloads" directory, you would us a command like:

SAT <- read.csv("~/Downloads/SAT_Results.csv", na.strings = "s")

Note that the na.strings argument is plural--you can have multiple values that get read in as NA.


Another option, if the data is already in your R workspace, is to get rid of your "s" values just by coercion. The columns are likely to be factors or characters at the moment. If you convert them to numeric, the "s" values would automatically become NA (you'll get warnings, but the warnings are only telling us what we already know).

So for instance, imagine we started here:

SAT <- read.csv("~/Downloads/SAT_Results.csv", na.strings = "s")

If we wanted to apply our operation across all numeric columns (all but the first two columns), we could do:

SAT[-c(1, 2)] <- lapply(SAT[-c(1, 2)], function(x) as.numeric(as.character(x)))

Alternatively, if you wanted to change just the third column, you can use something like the following:

SAT[[3]] <- as.numeric(as.character(SAT[[3]]))

Upvotes: 2

Related Questions