biocyberman
biocyberman

Reputation: 5915

Back tracing parents/paths of two-column data of a tree

I have a tree data serialized like the following: Relationship: P to C is "one-to-many", and C to P is "one-to-one". So column P may have duplicate values, but column C has unique values.

P, C
1, 2
1, 3
3, 4
2, 5
4, 6
# in data.frame
df <- data.frame(P=c(1,1,3,2,4), C=c(2,3,4,5,6))

1. How do I efficiently implement a function func so that:

func(df, val) returns a vector of full path to root (1 in this case).

For example:

func(df, 3) returns c(1,2,3)
func(df, 5)  returns c(1,2,5)
func(df, 6) returns c(1,3,4,6)

2. Alternatively, quickly transforming df to a lookup table like this also works for me:

C, Paths
2, c(1,2)
3, c(1,3)
4, c(1,3,4)
5, c(1,2,5)
6, c(1,2,4,6)

Upvotes: 4

Views: 52

Answers (1)

ThomasIsCoding
ThomasIsCoding

Reputation: 102489

Here is a solution using igraph

library(igraph)
g <- graph_from_data_frame(df)
df <- within(df,
             Path <- sapply(match(as.character(C),names(V(g))), 
                            function(k) toString(names(unlist(all_simple_paths(g,1,k))))))

such that

> df
  P C       Path
1 1 2       1, 2
2 1 3       1, 3
3 3 4    1, 3, 4
4 2 5    1, 2, 5
5 4 6 1, 3, 4, 6

Upvotes: 2

Related Questions