DirtStats
DirtStats

Reputation: 599

How to test for an absent row/value in a dataframe to help transpose part of it?

I have a dataframe containing data on repeatedly sampled individuals and days alive. Some individuals were not sampled on every day alive. I want to move the data from being row oriented (each individual&day alive being a row) to column oriented (one row for an individual, each column holding data for each day alive).

However, the code I am running for this exits with error when an individual does not have a row for a certain day alive in the first DF, because there is a column for that day alive in the second DF. I haven't found a good way to test for absence of a row and value in the first DF, it makes the value a numeric of length 0 (ie, numeric(0)) and performing logical tests against such a variable doesn't provide a logical answer (O or 1), it just yield logical(0).

What follows is a simplified example of what I'm trying to do. I know there may be other ways to deal with some of the larger data movements I'm doing, but would like to do it this way if possible. The code below will get stuck when individual=B and dayAlive=2, because there is no dayAlive=2 for that individual. I'd like to be able to test for the absence of a row like this and then insert an NA or something else into the second data frame cell where that data would go.

# Initialize data in row format in first data fram:
v1<-c("A",1,1.3)
v2<-c("A",2,1.8)
v3<-c("A",3,2.4)
v4<-c("B",1,0.8)
v5<-c("B",3,1.7)
first_DF<-data.frame(matrix(c(v1,v2,v3,v4,v5),ncol=3, nrow=5,byrow=TRUE,dimnames=list(NULL,c("Individual","DayAlive","Length"))), stringsAsFactors=FALSE)

# Convert to column format in second data frame:
individual_IDs<-unique(first_DF$"Individual")
days_alive<-unique(first_DF$"DayAlive")

# Initialize second DF by subsetting a single row for each individual from the first DF
second_DF<-data.frame(first_DF[which(first_DF$"Individual" %in% individual_IDs & first_DF$"DayAlive" %in% 1),1], stringsAsFactors=FALSE)
names(second_DF)<-"Individual"
initial_DF_width<-dim(second_DF)[2]

# Move 'Length' data into the columns as each 'day alive' column is created:
for(i in 1:length(days_alive)){
  current_day<-days_alive[i]
  second_DF<-cbind(second_DF,matrix(ncol=1, nrow=nrow(second_DF),dimnames=list(NULL,paste("Day ",current_day," Length"))))

  for(j in 1:length(individual_IDs)){
    current_individualID<-individual_IDs[j]
    length<-first_DF[which(first_DF$"Individual" %in% current_individualID & first_DF$"DayAlive" %in% current_day),"Length"]
    second_DF[j,i+initial_DF_width]<-length
  }
}

This is the error it throws:

Error in [<-.data.frame(*tmp*, j, i + initial_DF_width, value = character(0)) : replacement has length zero

(In my real code I had converted that data to numeric but didn't bother here).

Upvotes: 0

Views: 1300

Answers (1)

Kara Woo
Kara Woo

Reputation: 3615

You should look into the reshape2 package. Try this:

library('reshape2')

dcast(first_DF, Individual ~ DayAlive)
#   Individual   1    2   3
# 1          A 1.3  1.8 2.4
# 2          B 0.8 <NA> 1.7

Since you said you wanted to do it your way if possible, I've also edited your nested loop to work. However I would not advise doing it this way. Most people will tell you that nested loops in R are usually not the best idea, and that's definitely true in this case.

for(i in 1:length(days_alive)){
  current_day<-days_alive[i]
  second_DF<-cbind(second_DF,matrix(ncol=1, nrow=nrow(second_DF),dimnames=list(NULL,paste("Day ",current_day," Length"))))

  for(j in 1:length(individual_IDs)){
    current_individualID<-individual_IDs[j]

    # I changed "length" to "length2" to avoid confusion with the 
    # function length(). You also don't need which() here.
    length2 <- first_DF[first_DF$Individual %in% current_individualID 
                        & first_DF$DayAlive %in% current_day, "Length"]
    if (length(length2) > 0) {
      second_DF[j, i + initial_DF_width] <- length2
    } else {
      second_DF[j, i + initial_DF_width] <- NA
    }
  }
}

Upvotes: 2

Related Questions