dustin
dustin

Reputation: 4406

R: Extracting data from a data from for analysis

I am trying to extract data from a data frame for analysis.

heightweight <- function(person, health) {
    ## Read in data
    data <- read.csv("heightweight.csv", header = TRUE,
                     colClasses = "character")
    ## Check that the outcomes are valid
    measure = c("height", "weight")
    if(health %in% measure == FALSE){
        stop("Valid inputs are height and weight")
    }
    ## Truncate the data matrix to only what columns are needed
    data <- data[c(1, 5, 7)]
    ## Rename columns
    names(data)[1] <- "Name"
    names(data)[2] <- "Height"
    names(data)[3] <- "Weight"
    ## Convert numeric columns to numeric
    data[, 2] <- as.numeric(data[, 3])
    data[, 3] <- as.numeric(data[, 4])
    ## Convert NAs to 0 after coercion
    data[is.na(data)] <- 0
    ## Check that the name is valid
    name <- data[, 1]
    name <- unique(name)
    if(person %in% name == FALSE){
        stop("Invalid person")
    }
    ## Return person with lowest height or weight
    list <- data[data$name == person & data[health],]
    outcomes <- list[, health]
    minumum <- which.min(outcomes)
    ## Min Rate
    minimum[rowNum, ]$name
}

The problem I am having is occurring with

list <- data[data$name == person & data[health],]

That is, I run heightweight("Bob", "weight"), I get the following message

Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE), nrow = nr,  : 
  length of 'dimnames' [2] not equal to array extent

I have Googled this message and checked out some threads here but can't determine what the problem is.

Upvotes: 0

Views: 159

Answers (2)

AleMorales
AleMorales

Reputation: 46

Unless I'm missing something, if you only need the lowest weight or height for a given name, the last three lines of code are a bit redundant.

Here's a simple way to get the minimum health measurement for a given person:

min(data[data$name==person, "height"])

The first part selects only the rows of data that correspond to that person, it acts as a row index. The second part, after the comma, selects only the desired variable (column). Once you have selected the desired data, you look for the minimum in that subset of the data.

An example to illustrate the result:

data<-data.frame(name=as.character(c(rep("carlos",2),rep("marta",3),rep("johny",2),"sara")))
set.seed(1)
data$height <- rnorm(8,68,3)
data$weight <- rnorm(8,160,10)

The corresponding data frame:

   name   height   weight
1 carlos 66.12064 165.7578
2 carlos 68.55093 156.9461
3  marta 65.49311 175.1178
4  marta 72.78584 163.8984
5  marta 68.98852 153.7876
6  johny 65.53859 137.8530
7  johny 69.46229 171.2493
8   sara 70.21497 159.5507

Let's say we want the minimum weight for marta:

person <- "marta"
health <- "weight"

The minimum "weight" for "marta" is,

min(data[data$name==person,health])

which gives the desired result:

[1] 153.7876

Upvotes: 3

Marat Talipov
Marat Talipov

Reputation: 13304

Here is the simplified analogue of your function:

heightweight <- function(person,health) {
  data.set <- data.frame(names=rep(letters[1:5],each=3),height=171:185,weight=seq(95,81,by=-1))
  d1 <- data.set[data.set$name == person,]
  d2 <- d1[d1[,health]==min(d1[,health]),]
  d2[,c('names',health)]    
}

The first line produces a sample data set. The second line selects all records for a given person. The last line finds a record corresponding to the minimum value of health.

heightweight('b','height')
#   names height
# 4     b    174

Upvotes: 0

Related Questions