David Vivas
David Vivas

Reputation: 77

How to order the rows information of a data set

please help me... I have a data set containing information about academic degrees per year, like this:

Year1  Deg_Year1  Year2  Deg_Year2  Year3  Deg_Year3  Year4  Deg_Year4  Year5  Deg_Year5   
2001   College    2004   Master      NA      NA        NA       NA        NA       NA
2004   College    2004   Master      2010    PHD       NA       NA        NA       NA 
2006   Master     2006   College     NA      NA        NA       NA        NA       NA
2016   Master     NA     NA          NA      NA        NA       NA        NA       NA
2002   Master     2003   Master      2004    College   2004     Master    NA       NA  
2002   Master     2002   College     NA      NA        NA       NA        NA       NA 

I want to obtain a data frame that contains the year and the highest academic degree obtained before 2015, like this:

YearX   Highest_Degree
2004    Master
2010    PHD
2006    Master
NA      NA
2004    Master
2002    Master

Can someone please help me?
Thank you !

Upvotes: 1

Views: 40

Answers (1)

akrun
akrun

Reputation: 887881

We can create a vector of degrees in an order and then match against the 'Deg_Year' columns, get the maximum value in each row with max.col to subset the values and the corresponding 'Year' in each row

v1 <- c('Master', 'PHD')
nm1 <- grep('Deg', names(df1))
m1 <- sapply(df1[nm1], match, table = v1, nomatch = 0)
i1 <- max.col(m1) * NA^(!rowSums(m1!=0))
YearX <- df1[nm1-1][cbind(seq_len(nrow(df1)), i1)]
Highest_Degree <- df1[nm1][cbind(seq_len(nrow(df1)), i1)]
data.frame(YearX, Highest_Degree)
#  YearX Highest_Degree
#1  2004         Master
#2  2010            PHD
#3  2006         Master
#4    NA           <NA>
#5  2004         Master
#6  2002         Master

data

 df1 <- structure(list(Year1 = c(2001L, 2004L, 2006L, 2016L, 2002L, 2002L
), Deg_Year1 = c("College", "College", "Master", "College", "Master", 
"Master"), Year2 = c(2004L, 2004L, 2006L, NA, 2003L, 2002L), 
    Deg_Year2 = c("Master", "Master", "College", NA, "Master", 
    "College"), Year3 = c(NA, 2010L, NA, NA, 2004L, NA), Deg_Year3 = c(NA, 
    "PHD", NA, NA, "College", NA), Year4 = c(NA, NA, NA, NA, 
    2004L, NA), Deg_Year4 = c(NA, NA, NA, NA, "Master", NA), 
    Year5 = c(NA, NA, NA, NA, NA, NA), Deg_Year5 = c(NA, NA, 
    NA, NA, NA, NA)), .Names = c("Year1", "Deg_Year1", "Year2", 
"Deg_Year2", "Year3", "Deg_Year3", "Year4", "Deg_Year4", "Year5", 
"Deg_Year5"), class = "data.frame", row.names = c(NA, -6L))

Upvotes: 2

Related Questions