F.Lira
F.Lira

Reputation: 653

Select columns based on columns sum

Any suggestion to select the columns of the row when value =1 and the sum columns values =1. it means that I will just select unique values, non-shared with the other individuals.

indv. X Y Z W T J
A     1 0 1 0 0 1
B     0 1 1 0 0 0
C     0 0 1 1 0 0
D     0 0 1 0 1 0

A: X, J
B: Y
C: W
D: T

Upvotes: 0

Views: 2053

Answers (3)

Lodewic Van Twillert
Lodewic Van Twillert

Reputation: 783

Here you go! A solution in base r. First we simulate your data, a data.frame with named rows and columns.

You can use sapply() to loop over the column indices. A for-loop over the column indices will achieve the same thing.

Finally, save the results in a data.frame however you want.

# Simulate your example data
df <- data.frame(matrix(c(1, 0, 1, 0, 0, 1,
                          0, 1, 1, 0, 0, 0,
                          0, 0, 1, 1, 0, 0,
                          0, 0, 1, 0, 1, 0), nrow = 4, byrow = T))


# Names rows and columns accordingly
names(df) <- c("X", "Y", "Z", "W", "T", "J")
rownames(df) <- c("A", "B","C", "D")

> df
  X Y Z W T J
A 1 0 1 0 0 1
B 0 1 1 0 0 0
C 0 0 1 1 0 0
D 0 0 1 0 1 0

Then we select columns where the sum == 1- columns with unique values. For every one of these columns, we find the row of this value.

# Select columns with unique values (if sum of column == 1)
unique.cols <- which(colSums(df) == 1)
# For every one of these columns, select the row where row-value==1
unique.rows <- sapply(unique.cols, function(x) which(df[, x] == 1))

> unique.cols
X Y W T J 
1 2 4 5 6 

> unique.rows
X Y W T J 
1 2 3 4 1

The rows are not named correctly yet (they are still the element named of unique.cols). So we reference the rownames of df to get the rownames.

# Data.frame of unique values
#   Rows and columns in separate columns
df.unique <- data.frame(Cols = unique.cols,
                    Rows = unique.rows,
                    Colnames = names(unique.cols),
                    Rownames = rownames(df)[unique.rows],
                    row.names = NULL)

The result:

df.unique
  Cols Rows Colnames Rownames
1    1    1        X        A
2    2    2        Y        B
3    4    3        W        C
4    5    4        T        D
5    6    1        J        A

Edit:

This is how you could summarise the values per row using dplyr.

library(dplyr)

df.unique %>% group_by(Rownames) %>%
  summarise(paste(Colnames, collapse=", "))




   # A tibble: 4 x 2
  Rownames `paste(Colnames, collapse = ", ")`
  <fct>    <chr>                             
1 A        X, J                              
2 B        Y                                 
3 C        W                                 
4 D        T  

Upvotes: 3

akrun
akrun

Reputation: 886948

Here is an option with tidyverse. We gather the dataset to 'long' format, grouped by 'key', fiter the rows where 'val' is 1 and the sum of 'val is 1, grouped by 'indv.', summarise the 'key' by pasteing the elements together

library(dplyr)
library(tidyr)
gather(df1, key, val, -indv.) %>%         
     group_by(key) %>% 
     filter(sum(val) == 1, val == 1) %>%
     group_by(indv.) %>% 
     summarise(key = toString(key))
# A tibble: 4 x 2
#   indv. key  
#   <chr> <chr>
#1 A     X, J 
#2 B     Y    
#3 C     W    
#4 D     T    

Upvotes: 2

Sotos
Sotos

Reputation: 51582

One idea is to use rowwise apply to find the columns with 1, after we filter out the columns with sum != to 1, i.e.

apply(df[colSums(df) == 1], 1, function(i) names(df[colSums(df) == 1])[i == 1])

$A
[1] "X" "J"

$B
[1] "Y"

$C
[1] "W"

$D
[1] "T"

You can play around with the output to get it to desired state, i.e.

apply(df[colSums(df) == 1], 1, function(i) toString(names(df[colSums(df) == 1])[i == 1]))
#     A      B      C      D 
#"X, J"    "Y"    "W"    "T" 

Or

data.frame(cols = apply(df[colSums(df) == 1], 1, function(i) toString(names(df[colSums(df) == 1])[i == 1])))

#  cols
#A X, J
#B    Y
#C    W
#D    T

Upvotes: 3

Related Questions