LiberLashton
LiberLashton

Reputation: 3

R: Conditionally duplicating data from columns into rows

I am trying to insert new rows that would duplicate some data contained in the row but the first column would be unique data that is inserted from an existing column within R.

I am trying to set this data up to be utilized in Tableau and create a network visualization. I don't want my customers entering the data to insert a lot of duplicate data in order to create this visualization.

My current data looks like this:

            Connection.ID     From             To              Note
  1         1                 Niamh MacCallum  James Fraser    Niamh and James are coworkers
  2         2                 James Fraser     Simon David     James and Simon are brothers
  3         3                 Niamh MacCallum  Tom Ashton      Niamh recruited Tom to join her organization

This is some fake data I created that replicates my company's data, but the goal is being able to visualize connections between our employees and customers/volunteers they meet and form professional relationships with.

I would like my data to look like this which I export into a csv:

            Connection.ID     Node.Name        Notes
  1         1                 Niamh MacCallum  Niamh and James are coworkers
  2         1                 James Fraser     Niamh and James are coworkers
  3         2                 James Fraser     James and Simon are brothers
  4         2                 Simon David      James and Simon are brothers     
  5         3                 Niamh MacCallum  Niamh recruited Tom to join her organization
  6         3                 Tom Ashton       Niamh recruited Tom to join her organization

I've found a couple of resources that create something similar, the best one being this previously-asked question, but it wasn't quite getting to what I needed or I honestly could have been misapplying it (conditionally duplicating rows in a data frame). I thought I could create the same thing while removing the "To" column and renaming "From" to "Node.Name" but I created repetitive data that inserted six copies of each row while also misapplying notes to the wrong connections.

I'd appreciate any help! I'm fairly new to R and self-taught, so if you have a solution or a resource where I can learn the solution that'd be great too. Thanks!

EDIT: Found a similar question I had not seen before, so I am adding it here in case someone else finds this and can reference both of them: Create network files from "classic" dataframe in R - igraph

Upvotes: 0

Views: 59

Answers (1)

R. Schifini
R. Schifini

Reputation: 9313

This is a wide to long transformation that can be done with melt from the reshape2 package. Do:

df2 = melt(data = df, 
           id.vars = c("Connection.ID","Note"), 
           measure.vars = c("From","To"), 
           variable.name = 'From_To',
           value.name = "Node.Name" )

# Remove the unwanted From_To column
df2$From_To = NULL

Result:

> df2
  Connection.ID                                             Note       Node.Name
1             1                    Niamh and James are coworkers Niamh MacCallum
2             2                     James and Simon are brothers    James Fraser
3             3     Niamh recruited Tom to join her organization Niamh MacCallum
4             1                    Niamh and James are coworkers    James Fraser
5             2                     James and Simon are brothers     Simon David
6             3     Niamh recruited Tom to join her organization      Tom Ashton

Upvotes: 0

Related Questions