Reputation: 1160
I have a massive dataset (9.000.000 entries) with two columns which are factors (409 levels). This represents flights between airports on a certain period. The dataset below is already after conversion. Meaning that "ORIGIN" and "DEST" are on its numeric form.
ORIGIN DEST weight alpha
1 24 1195 1.512274e-04
1 78 844 2.557285e-03
100 2 1615 3.176266e-17
100 3 4196 9.111249e-09
100 7 1221 6.471515e-10
100 12 725 2.129114e-04
A second dataset, has all the IATA codes, with the latitude and longitude.
City IATA Latitude Longitude
Goroka GKA -6.081690 145.392
Madang MAG -5.207080 145.789
Mount Hagen HGU -5.826790 144.296
Nadzab LAE -6.569803 146.726
Port Moresby POM -9.443380 147.220
Wewak WWK -3.583830 143.669
The current flow is the following:
My problem is that I wanted now to convert the numbers I have, back to the factors from before as I'll need latitude and longitude from the second dataset.
Any ideas? I've tried pretty much everything I can think of.
Upvotes: 3
Views: 3512
Reputation: 13680
I would store your factor levels before converting it as.numeric, and then reapply them when restoring the factor class.
An example to clear what I'm saying:
data(iris)
# Store the levels
l<-levels(iris$Species)
# Convert to numeric
iris$Species <- as.numeric(iris$Species)
head(iris$Species)
class(iris$Species)
# Convert back to factor
iris$Species <- factor(iris$Species, labels = l)
head(iris$Species)
class(iris$Species)
Upvotes: 2
Reputation: 1421
Before coercing the factors to numeric, create a lookup table of numeric-factor label pairs. At the end of your workflow, merge the factor labels back into your data.
library(dplyr)
data(warpbreaks)
original <- warpbreaks
value_label_map <- warpbreaks %>%
select(wool, tension) %>%
mutate(wool_num = as.numeric(wool), tension_num = as.numeric(tension)) %>%
distinct()
warpbreaks <- warpbreaks %>%
mutate(wool = as.numeric(wool), tension = as.numeric(tension))
warpbreaks <- left_join(warpbreaks, value_label_map,
by = c("wool" = "wool_num", "tension" = "tension_num"))
identical(original$wool, warpbreaks$wool.y)
identical(original$tension, warpbreaks$tension.y)
Upvotes: 0