Scott
Scott

Reputation: 79

inserting data by choosing from one of multiple levels

with 2 dataframes as

df1

  num organism tier    A    B    C    D    E
1   1      Ham    1 <NA>    a <NA> <NA>    b
2   1      Ham    1 <NA>    c <NA> <NA>    b
3   1      Sta    1    b <NA> <NA>    a    b
4   1      Sta    1    a <NA> <NA>    a    c
5   1      Sta    1 <NA> <NA> <NA> <NA> <NA>
6   2      Sta    2    c    c <NA>    a    b
7   2      Sta    2    a    c <NA>    b    a
8   3      Ham    3 <NA> <NA> <NA> <NA> <NA>


df2

  num tier name
1   1    1  Ham
2   1    1  Sta
3   2    2  Sta
4   3    3  Ham

i would like to find the columns which match the num, organism, and tier from df1 and insert the information from df1$A to df1$E. instead of inserting all the data, is there a way to choose only the highest level?

for example, if levels 'a' > 'b' > 'c', insert 'a' if 'a' is present, insert 'b' if 'b' is present wihout any 'a's, and so on. lastly, if only NAs or present, insert NA.

my final result should look as

df3

  num tier name    A     B    C    D    E
1   1    1  Ham  <NA>    a <NA> <NA>    b
2   1    1  Sta    a  <NA> <NA>    a    b
3   2    2  Sta    a     c <NA>    a    a
4   3    3  Ham  <NA> <NA> <NA> <NA> <NA>

Upvotes: 1

Views: 33

Answers (1)

thelatemail
thelatemail

Reputation: 93813

I don't think you even need df2. Just aggregate after assigning the correct levels to an ordered factor:

ids <- match(c("num","organism","tier"), names(df1))
df1[-ids] <- lapply(df1[-ids], ordered, levels=c("c","b","a") )
aggregate(df1[-ids], df1[ids], FUN = max, na.rm=TRUE)

#  num organism tier    A    B    C    D    E
#1   1      Ham    1 <NA>    a <NA> <NA>    b
#2   1      Sta    1    a <NA> <NA>    a    b
#3   2      Sta    2    a    c <NA>    a    a
#4   3      Ham    3 <NA> <NA> <NA> <NA> <NA>

Ignore the warnings() - R just warns that you have taken the max of nothing, in the case where you only have NA values.

Upvotes: 2

Related Questions