Reputation: 163
I want to identify column names and index based upon a specific value. Here is my sample dataframe -
set.seed(1)
age_range = sample(c("ar2-15", "ar16-29", "ar30-44"), 200, replace = TRUE)
gender = sample(c("M", "F",-999), 200, replace = TRUE)
region = sample(c("A", "B", "C"), 200, replace = TRUE)
physi = sample(c("Poor", "Average", "Good"), 200, replace = TRUE)
height = sample(c(4,5,6,-999), 200, replace = TRUE)
height2 = sample(c(40,0), 200, replace = TRUE)
weight2 = sample(c(20,0,-999), 200, replace = TRUE)
survey = data.frame(age_range, gender, region,physi,height,height2,weight2)
head(survey)
How can I find the column names and indices in survey df where -999 exists? I tried using some combination of apply and which, but it did not work. Obviously I am doing something wrong.
EDIT:
> apply(survey,2,function(x) match(-999,x))
age_range gender region physi height height2 weight2
NA 10 NA NA 1 NA 2
This only gives me all column names and shows NA for the ones that don't have -999. Any pointers are highly appreciated. Thanks! Jennifer
Upvotes: 1
Views: 94
Reputation: 163
Building off of d.b.'s comment I created this short line of code which does what I want. Thank you Stackoverflow community!
q=unique(data.frame(which(survey == -999, arr.ind = TRUE))[2])[1]$col
q # 2 5 7
names(survey[,q]) # [1] "gender" "height" "weight2"
Upvotes: 2
Reputation: 1987
Does lappply
which
give you what you want? It will return a list of your column names, each item of which contains the indices where that element = -999
lapply(survey,function(x) which(x==-999))
Upvotes: 1