Reputation: 77
I have the following df below:
name name..2 IGD
1 yaaA recF 16
2 recF yaaB 18
3 yaaD yaaE 22
4 dck dgk -3
5 dnaX yaaK 24
6 yaaK recR 15
7 recR yaaL 18
8 xpaC yaaN 19
9 yaaO tmk -3
10 yaaQ yaaR 13
11 yaaR holB 12
12 holB yaaT 3
13 yaaT yabA 15
14 yabB yazA -13
15 yazA yabC -25
I am trying to find a way to paste together the values from name and name..2 where name..2 matches name in the next row and put that into a new df that should look like this:
1 yaaA recF
2 yaaD
3 dck
4 dnaX yaaK recR
5 xpaC
6 yaaO
7 yaaQ yaaR holB yaaT
8 yabB yazA
Is there a r function I can use for this? I have tried searching SO, but haven't found a solution yet to this problem. Thanks in advance for the help.
Upvotes: 2
Views: 419
Reputation: 887128
We can do this in data.table
as well
library(data.table)
setDT(df)[, .(name = toString(name)),
.(group = cumsum(name != shift(name2, fill = TRUE)))]
# group name
#1: 1 yaaA, recF
#2: 2 yaaD
#3: 3 dck
#4: 4 dnaX, yaaK, recR
#5: 5 xpaC
#6: 6 yaaO
#7: 7 yaaQ, yaaR, holB, yaaT
#8: 8 yabB, yazA
df <- structure(list(name = c("yaaA", "recF", "yaaD", "dck", "dnaX",
"yaaK", "recR", "xpaC", "yaaO", "yaaQ", "yaaR", "holB", "yaaT",
"yabB", "yazA"), name2 = c("recF", "yaaB", "yaaE", "dgk", "yaaK",
"recR", "yaaL", "yaaN", "tmk", "yaaR", "holB", "yaaT", "yabA",
"yazA", "yabC"), IGD = c(16L, 18L, 22L, -3L, 24L, 15L, 18L, 19L,
-3L, 13L, 12L, 3L, 15L, -13L, -25L)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15"))
Upvotes: 0
Reputation: 50678
Here is a different option drawing on identifying clusters
within an igraph
library(igraph)
library(tidyverse)
df %>%
select(-IGD) %>%
graph_from_data_frame() %>%
clusters() %>%
magrittr::extract2(1) %>%
split(., .) %>%
map_dfr(~tibble(x = toString(names(.x)[-length(.x)])))
## A tibble: 8 x 1
# x
# <chr>
#1 yaaA, recF
#2 yaaD
#3 dck
#4 dnaX, yaaK, recR
#5 xpaC
#6 yaaO
#7 yaaQ, yaaR, holB, yaaT
#8 yabB, yazA
The idea is to construct an igraph
from df[c("name", "name..2")]
and then to identify clusters of connected nodes. The clusters are then the groups, and all we need to do is remove the last element (node).
df <- read.table(text =
" name name..2 IGD
1 yaaA recF 16
2 recF yaaB 18
3 yaaD yaaE 22
4 dck dgk -3
5 dnaX yaaK 24
6 yaaK recR 15
7 recR yaaL 18
8 xpaC yaaN 19
9 yaaO tmk -3
10 yaaQ yaaR 13
11 yaaR holB 12
12 holB yaaT 3
13 yaaT yabA 15
14 yabB yazA -13
15 yazA yabC -25", header = T)
Upvotes: 2
Reputation: 323236
In Base R , We using tail
head
and cumsum
create the group key , then using aggregate
df$id=cumsum(c(TRUE, tail(df$name,-1) != head(df$name2,-1)))
output=aggregate(name ~ id, data = df, toString)
output
id name
1 1 yaaA, recF
2 2 yaaD
3 3 dck
4 4 dnaX, yaaK, recR
5 5 xpaC
6 6 yaaO
7 7 yaaQ, yaaR, holB, yaaT
8 8 yabB, yazA
Upvotes: 2
Reputation: 388982
The logic is similar to @Wen-Ben here, a dplyr
way to do this
library(dplyr)
df %>%
group_by(group = cumsum(name != lag(name2, default = TRUE))) %>%
summarise(name = toString(name))
# group name
# <int> <chr>
#1 1 yaaA, recF
#2 2 yaaD
#3 3 dck
#4 4 dnaX, yaaK, recR
#5 5 xpaC
#6 6 yaaO
#7 7 yaaQ, yaaR, holB, yaaT
#8 8 yabB, yazA
The main idea is to create a grouping variable which increments every time name != name2
.
Upvotes: 3