Reputation: 79
with 2 dataframes as
df1
num organism tier A B C D E
1 1 Ham 1 <NA> a <NA> <NA> b
2 1 Ham 1 <NA> c <NA> <NA> b
3 1 Sta 1 b <NA> <NA> a b
4 1 Sta 1 a <NA> <NA> a c
5 1 Sta 1 <NA> <NA> <NA> <NA> <NA>
6 2 Sta 2 c c <NA> a b
7 2 Sta 2 a c <NA> b a
8 3 Ham 3 <NA> <NA> <NA> <NA> <NA>
df2
num tier name
1 1 1 Ham
2 1 1 Sta
3 2 2 Sta
4 3 3 Ham
i would like to find the columns which match the num, organism, and tier from df1 and insert the information from df1$A to df1$E. instead of inserting all the data, is there a way to choose only the highest level?
for example, if levels 'a' > 'b' > 'c', insert 'a' if 'a' is present, insert 'b' if 'b' is present wihout any 'a's, and so on. lastly, if only NAs or present, insert NA.
my final result should look as
df3
num tier name A B C D E
1 1 1 Ham <NA> a <NA> <NA> b
2 1 1 Sta a <NA> <NA> a b
3 2 2 Sta a c <NA> a a
4 3 3 Ham <NA> <NA> <NA> <NA> <NA>
Upvotes: 1
Views: 33
Reputation: 93813
I don't think you even need df2
. Just aggregate
after assigning the correct levels to an ordered
factor
:
ids <- match(c("num","organism","tier"), names(df1))
df1[-ids] <- lapply(df1[-ids], ordered, levels=c("c","b","a") )
aggregate(df1[-ids], df1[ids], FUN = max, na.rm=TRUE)
# num organism tier A B C D E
#1 1 Ham 1 <NA> a <NA> <NA> b
#2 1 Sta 1 a <NA> <NA> a b
#3 2 Sta 2 a c <NA> a a
#4 3 Ham 3 <NA> <NA> <NA> <NA> <NA>
Ignore the warnings()
- R just warns that you have taken the max
of nothing, in the case where you only have NA
values.
Upvotes: 2