FilipeTeixeira
FilipeTeixeira

Reputation: 1160

Converting factor variable to numeric, and from numeric back to factor

I have a massive dataset (9.000.000 entries) with two columns which are factors (409 levels). This represents flights between airports on a certain period. The dataset below is already after conversion. Meaning that "ORIGIN" and "DEST" are on its numeric form.

  ORIGIN DEST weight        alpha
      1   24   1195 1.512274e-04
      1   78    844 2.557285e-03
    100    2   1615 3.176266e-17
    100    3   4196 9.111249e-09
    100    7   1221 6.471515e-10
    100   12    725 2.129114e-04

A second dataset, has all the IATA codes, with the latitude and longitude.

           City IATA  Latitude Longitude
         Goroka  GKA -6.081690   145.392
         Madang  MAG -5.207080   145.789
    Mount Hagen  HGU -5.826790   144.296
         Nadzab  LAE -6.569803   146.726
   Port Moresby  POM -9.443380   147.220
          Wewak  WWK -3.583830   143.669

The current flow is the following:

  1. Convert the 2 columns into numeric (as I need them later like that)
  2. Convert the data.set into igraph
  3. Apply the filtering algorithm (that's why the columns are numeric)
  4. Convert again to a dataset.

My problem is that I wanted now to convert the numbers I have, back to the factors from before as I'll need latitude and longitude from the second dataset.

Any ideas? I've tried pretty much everything I can think of.

Upvotes: 3

Views: 3512

Answers (2)

GGamba
GGamba

Reputation: 13680

I would store your factor levels before converting it as.numeric, and then reapply them when restoring the factor class.
An example to clear what I'm saying:

data(iris)
# Store the levels
l<-levels(iris$Species)

# Convert to numeric
iris$Species <- as.numeric(iris$Species)
head(iris$Species)
class(iris$Species)

# Convert back to factor
iris$Species <- factor(iris$Species, labels = l)
head(iris$Species)
class(iris$Species)

Upvotes: 2

effel
effel

Reputation: 1421

Before coercing the factors to numeric, create a lookup table of numeric-factor label pairs. At the end of your workflow, merge the factor labels back into your data.

library(dplyr)
data(warpbreaks)
original <- warpbreaks

value_label_map <- warpbreaks %>%
  select(wool, tension) %>%
  mutate(wool_num = as.numeric(wool), tension_num = as.numeric(tension)) %>%
  distinct()

warpbreaks <- warpbreaks %>%
  mutate(wool = as.numeric(wool), tension = as.numeric(tension))

warpbreaks <- left_join(warpbreaks, value_label_map,
  by = c("wool" = "wool_num", "tension" = "tension_num"))

identical(original$wool, warpbreaks$wool.y)
identical(original$tension, warpbreaks$tension.y)

Upvotes: 0

Related Questions