Reputation: 1944
I'm trying to prep data to be used in for various network visualisation applications in R and also Gephi. These formats want numeric identifiers that link between two databases. I have figured out the latter part, but I'm not able to find a succinct way to create a numeric ID variable across columns in a dataframe. Here's some replicable code that illustrates what I'm trying to do.
org.data <- data.frame(source=c('bob','sue','ann','john','sinbad'),
target=c('sinbad','turtledove','Aerosmith','bob','john'))
desired.data <- data.frame(source=c('1','2','3','4','5'),
target=c('5','6','7','1','4'))
org.data
source target
1 bob sinbad
2 sue turtledove
3 ann Aerosmith
4 john bob
5 sinbad john
desired.data
source target
1 1 5
2 2 6
3 3 7
4 4 1
5 5 4
Upvotes: 1
Views: 80
Reputation: 38510
Here's a base R method using match
on the unlisted unique names in the original data.frame.
To replace the current data.frame, use
org.data[] <- sapply(org.data, match, table=unique(unlist(org.data)))
Here, sapply
loops through the variables in org.data, and applies match
to each. match
returns the position of of the first argument in the table argument. Here, table is the unlisted unique elements in org.data: unique(unlist(org.data))
. In this case, sapply
returns a matrix. It is converted to a data.frame, replacing the original by appending []
to org.data in org.data[] <-
. This construction can be thought of as preserving the structure of the original object during the assignment.
To construct a new data.frame, use
setNames(data.frame(sapply(org.data, match, table=unique(unlist(org.data)))),
names(org.data))
Or better, as Henrik suggests, it would probably be easier to first create a copy of the data.frame and then use the first line of code to fill in the copy rather than using setNames
and data.frame
.
desired.data <- org.data
Both of these return
source target
1 1 5
2 2 6
3 3 7
4 4 1
5 5 4
Upvotes: 4
Reputation: 9618
You could try this:
org.data[] <- as.numeric(factor(c(as.matrix(org.data)), levels = unique(c(as.matrix(org.data)))))
org.data
source target
1 1 5
2 2 6
3 3 7
4 4 1
5 5 4
Upvotes: 3
Reputation: 6750
Convert to factors, then to integers.
org.data <- data.frame(source=c('bob','sue','ann','john','sinbad'),
target=c('sinbad','turtledove','Aerosmith','bob','john'))
# need to make sure that columns are characters, not factors
org.data$source <- as.character(org.data$source)
org.data$target <- as.character(org.data$target)
# define possible values that cover the two columns
levels <- unique(c(org.data$source, org.data$target))
# factorize, then cast to integer
org.data$source <- as.integer(factor(org.data$source, levels=levels))
org.data$target <- as.integer(factor(org.data$target, levels=levels))
org.data
Upvotes: 0
Reputation: 17648
You can try following. The idea is to create factors using levels over all unique names.
library(tidyverse)
org.data %>%
mutate(source2 = factor(source, levels=unique(unlist(org.data)) , labels=1:length(unique(unlist(org.data))))) %>%
mutate(target2 = factor(target, levels=unique(unlist(org.data)) , labels=1:length(unique(unlist(org.data)))))
source target source2 target2
1 bob sinbad 1 5
2 sue turtledove 2 6
3 ann Aerosmith 3 7
4 john bob 4 1
5 sinbad john 5 4
Upvotes: 0