plicht
plicht

Reputation: 133

How to remove redundant rows in a data.frame (by columns [1, 2] and vice versa)?

I obtained a distance.class table where samples where compared against each other to calculate an index. As a result, each value is duplicated as well as self comparisons occur. See example table below:

Sample1 Sample2 Sample3
Sample1 0 0.5 1
Sample2 0.5 0 0.8
Sample3 1 0.8 0

I already removed the self comparisons (sample1 vs sample1 etc.) But I do not know how to remove the redundant values (i. e. the upper half of the table). Desired output is a table like below, which I can then melt into a data.frame to build plots with. The samples are also of a specific type which I want to use to build the plots.

Sample1 Sample2 Sample3
Sample1
Sample2 0.5
Sample3 1 0.8
Var1 Var2 Type1 Type2 Value
Sample1 Sample2 a b 0.5
Sample1 Sample3 a a 1
Sample2 Sample3 b a 0.8

Upvotes: 0

Views: 208

Answers (1)

plicht
plicht

Reputation: 133

Thanks a lot, with usedist::dist_make() I was able to produce the intended solution.

After generating the class "dist" matrix calling phyloseq::distance(), I extracted the grouping variables from the phyloseq object with:

group2samp <- list() 
    group_list <- get_variable(sample_data(physeq), group) 
    for (groups in levels(group_list)) { # loop over the no. of group levels
        target_group <- which(group_list == groups) 
        group2samp[[ groups ]] <- sample_names(physeq)[target_group] 
    }  

Then I melted the resulting "group2samp" list and rearranged the order of the first column to match with my distance matrix:

library(reshape2)    
item_groups = melt(group2samp)

library(dplyr)
item_groups = arrange(item_groups, value)
# needed to reverse the column to match with my distance matrix
item_groups = item_groups[order(nrow(item_groups):1),]
item_groups = item_groups$L1 #extract only grouping variable

library(usedist)
distances = dist_groups(distance_matrix, item_groups)

distances
     Item1    Item2      Group1      Group2                          Label   Distance
1    sample9  sample8       Patch      Plaque       Between Patch and Plaque 0.94344640
2    sample9 sample70       Patch nonlesional  Between nonlesional and Patch 0.60253312
3    sample9 sample69       Patch       Patch                   Within Patch 0.62086228

Upvotes: 0

Related Questions