Reputation: 63
I have a data frame with set of 1200 individual cases in duplicate in one column for a total of 2400. i.e. A1.1234567_10, A1.1234567_20, There are multiple columns that I would like to compare such that each duplicate pair has the same or discrepant result in each column. columns contain factors How can I make it so that it can give a logical for my factors. I want to select each case by its ID (i.e A1.1234567) that matches _10 and _20:
EXAMPLE (one row of data frame)
A1.1234567_10 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL
A1.1234567_20 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL ABNORMAL NORMAL
Id like the output to look like this(new data frame)
A1.1234567 TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
And this would repeat for all of the samples down the column by unique ID number comparing _10 and _20
Upvotes: 1
Views: 160
Reputation: 18691
Another approach with tidyverse
(credits to @alistaire's dput
):
library(tidyverse)
library(stringr)
df %>%
group_by(ID = str_extract(ID, ".+(?=_)")) %>%
summarize_all(funs(dim(table(.)) == 1))
Result:
# A tibble: 1 x 9
ID var1 var2 var3 var4 var5 var6 var7 var8
<chr> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A1.1234567 TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
Upvotes: 0
Reputation: 43354
Here's a tidyverse option:
library(tidyverse)
df <- structure(list(ID = c("A1.1234567_10", "A1.1234567_20"),
var1 = c("NORMAL", "NORMAL"),
var2 = c("NORMAL", "NORMAL"),
var3 = c("NORMAL", "NORMAL"),
var4 = c("NORMAL", "NORMAL"),
var5 = c("NORMAL", "NORMAL"),
var6 = c("NORMAL", "NORMAL"),
var7 = c("NORMAL", "ABNORMAL"),
var8 = c("NORMAL", "NORMAL")),
.Names = c("ID", "var1", "var2", "var3", "var4", "var5", "var6", "var7", "var8"),
class = "data.frame", row.names = c(NA, -2L))
# separate group variable from observation label
df_tidy <- df %>% separate(ID, c('ID', 'obs'), sep = '_')
df_tidy
#> ID obs var1 var2 var3 var4 var5 var6 var7 var8
#> 1 A1.1234567 10 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL
#> 2 A1.1234567 20 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL ABNORMAL NORMAL
df_tidy %>%
select(-obs) %>%
group_by(ID) %>%
summarise_all(lift(`==`))
#> # A tibble: 1 x 9
#> ID var1 var2 var3 var4 var5 var6 var7 var8
#> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 A1.1234567 TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
Upvotes: 3