Reputation: 15730
I have the dataframe
test <- structure(list(
y2002 = c("freshman","freshman","freshman","sophomore","sophomore","senior"),
y2003 = c("freshman","junior","junior","sophomore","sophomore","senior"),
y2004 = c("junior","sophomore","sophomore","senior","senior",NA),
y2005 = c("senior","senior","senior",NA, NA, NA)),
.Names = c("2002","2003","2004","2005"),
row.names = c(c(1:6)),
class = "data.frame")
> test
2002 2003 2004 2005
1 freshman freshman junior senior
2 freshman junior sophomore senior
3 freshman junior sophomore senior
4 sophomore sophomore senior <NA>
5 sophomore sophomore senior <NA>
6 senior senior <NA> <NA>
And I would like to munge the data to get the individual steps only for each row, as in
result <- structure(list(
y2002 = c("freshman","freshman","freshman","sophomore","sophomore","senior"),
y2003 = c("junior","junior","junior","senior","senior",NA),
y2004 = c("senior","sophomore","sophomore",NA,NA,NA),
y2005 = c(NA,"senior","senior",NA, NA, NA)),
.Names = c("1","2","3","4"),
row.names = c(c(1:6)),
class = "data.frame")
> result
1 2 3 4
1 freshman junior senior <NA>
2 freshman junior sophomore senior
3 freshman junior sophomore senior
4 sophomore senior <NA> <NA>
5 sophomore senior <NA> <NA>
6 senior <NA> <NA> <NA>
I know that if I treated each row as a vector, I could do something like
careerrow <- c(1,2,3,3,4)
pairz <- lapply(careerrow,function(i){c(careerrow[i],careerrow[i+1])})
uniquepairz <- careerrow[sapply(pairz,function(x){x[1]!=x[2]})]
My difficulty is to apply that row-wise to my data table. I assume lapply is the way to go, but so far I am unable to solve this one.
Upvotes: 1
Views: 221
Reputation: 89097
lapply
, when passed a data.frame, operates on its columns. That's because a data.frame is a list whose elements are the columns. Instead of lapply
, you can use apply
with MARGIN=1
:
unique.padded <- function(x) {
uniq <- unique(x)
out <- c(uniq, rep(NA, length(x) - length(uniq)))
}
t(apply(test, 1, unique.padded))
# [,1] [,2] [,3] [,4]
# 1 "freshman" "junior" "senior" NA
# 2 "freshman" "junior" "sophomore" "senior"
# 3 "freshman" "junior" "sophomore" "senior"
# 4 "sophomore" "senior" NA NA
# 5 "sophomore" "senior" NA NA
# 6 "senior" NA NA NA
Edit: I saw your comment about your final goal. I would do something like this:
table(sapply(apply(test, 1, function(x)unique(na.omit(x))),
paste, collapse = "_"))
# freshman_junior_senior freshman_junior_sophomore_senior
# 1 2
# senior sophomore_senior
# 1 2
Upvotes: 2
Reputation: 115435
If your aim is to calculate the total number of each pathway
You could use something like this (using data.table
because of the nice way it handles lists as elements within a data.table (data.frame-like) object.
I am using !duplicated(...)
to remove the duplicates as this is slightly more efficient than unique.
library(data.table)
library(reshape2)
# make the rownames a column
test$id <- rownames(test)
# put in long format
DT <- as.data.table(melt(test,id='id'))
# get the unique steps and concatenate into a unique identifier for each pathway
DL <- DT[!is.na(value), {.steps <- value[!duplicated(value)]
stepid <- paste(.steps, sep ='.',collapse = '.')
list(steps = list(.steps), stepid =stepid)}, by=id]
## id steps stepid
## 1: 1 freshman,junior,senior freshman.junior.senior
## 2: 2 freshman,junior,sophomore,senior freshman.junior.sophomore.senior
## 3: 3 freshman,junior,sophomore,senior freshman.junior.sophomore.senior
## 4: 4 sophomore,senior sophomore.senior
## 5: 5 sophomore,senior sophomore.senior
## 6: 6 senior senior
# count the number per path
DL[, .N, by = stepid]
## stepid N
## 1: freshman.junior.senior 1
## 2: freshman.junior.sophomore.senior 2
## 3: sophomore.senior 2
## 4: senior 1
Upvotes: 3