R count number of specific string in data frame

Question

Sorry for beginner questions.

I have a data frame(I think, please correct me if I'm wrong here.)

data <- read.csv("adult.data", sep=',', header=F)

Data is https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data

When data is missing, it just has "?" instead of data. I need to count how much data is missing in each column.

I can count instances of a number, but not strings.

Col 1 is age, so I can do this:

length(which(data[,1] == 55))

And it will tell me how many people were 55 in this dataset.

But if I try

length(which(data[,2] == "?"))

It says 0.

How do I compare strings in R?

Rich Scriven · Accepted Answer

It looks like if you read it in again with na.strings = "?" and strip.white = TRUE, you'll get proper NA values and be able to use is.na()

df <- read.csv(
    "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", 
    header = FALSE, 
    na.strings = "?", 
    strip.white = TRUE
)

## total NA in the data
sum(is.na(df))
# [1] 4262

## total NA for column 2
sum(is.na(df[[2]]))
# [1] 1836

## count NA by column
colSums(is.na(df))
#   V1   V2   V3   V4   V5   V6   V7   V8   V9  V10  V11  V12  V13  V14  V15
#    0 1836    0    0    0    0 1843    0    0    0    0    0    0  583    0

R count number of specific string in data frame

Answers (2)

Related Questions