user08041991
user08041991

Reputation: 637

subset data across multiple columns

Consider the data here:

X <- 1:4
Ya <- 10:13
Yb <- 2:5
Yc <- c(10,11,6,NA)

df <- data.frame(X, Ya, Yb, Yc)

For each X value, I want to extract the unique Y values (from Ya:Yc)

So I am trying to achieve an output:

# the first number is the X value, then the proceeding numbers are
# the unique Ya:Yc values for each row
# 1, 10, 2
# 2, 11, 3
# 3, 12, 4 , 6
# 4, 13, 5

I have tried using a simple for loop.

output1 <- c(NA,NA,NA,NA)

for(i in 1:4) {
  output1[i] <- c(i,as.numeric(unique(df[i,2:4 ])))
}

Upvotes: 1

Views: 406

Answers (2)

akrun
akrun

Reputation: 887048

Here is an option using data.table. Convert the data.frame to data.table (setDT(df)), melt from 'wide' to 'long' format, get the unique elements by 'X' and 'value' column, then dcast from 'long' to 'wide' format.

library(data.table)
dcast(unique(melt(setDT(df), id.var="X"),
          by = c("X", "value")), X~variable, value.var="value")
#    X Ya Yb Yc
#1: 1 10  2 NA
#2: 2 11  3 NA
#3: 3 12  4  6
#4: 4 13  5 NA

Upvotes: 2

Steven Beaupr&#233;
Steven Beaupr&#233;

Reputation: 21621

Try:

library(dplyr)
library(tidyr)

df %>%
  gather(key, value, -X) %>%
  group_by(X) %>%
  distinct(value) %>%
  spread(key, value)

Which gives:

#Source: local data frame [4 x 4]
#Groups: X [4]
#
#      X    Ya    Yb    Yc
#  (int) (dbl) (dbl) (dbl)
#1     1    10     2    NA
#2     2    11     3    NA
#3     3    12     4     6
#4     4    13     5    NA

Upvotes: 3

Related Questions