Reputation: 341
I have the following dataframe
OCC1990 Skilllevel
3 1
8 2
12 2
14 3
15 1
As illustrated above it contains a long list of occupations assigned to a specific skill level.
My actual dataframe is a household survey with millions of rows, including a column which is also named OCC1990
.
My goal is to implement my assigned skill levels from the above-listed data frame into the household survey.
I applied in the past already the following code for smaller dataframes, which is a pretty manual way
cps_data[cps_data$OCC1990 %in% 3,"skilllevel"] <- 1
cps_data[cps_data$OCC1990 %in% 4:7,"skilllevel"] <- 1
cps_data[cps_data$OCC1990 %in% 8,"skilllevel"] <- 2
But due to the fact that I don't wanna spend hours copying pasting as well as it increases the probability of making mistakes I'm searching for a different, more direct way.
I've already tried to merge both dataframes, but this result in an error related to the size of the vector.
Is there another way than merging just the two dataframes to assign the skill level also to the occupations in the survey?
Many thanks in advance Xx freddy
Upvotes: 1
Views: 50
Reputation: 12723
Using data.table
for large dataset
create two vectors: levels
and labels
. The levels
contains unique values of OCC1990
and labels
contains the new skill levels you want to apply.
Now use levels
and labels
inside the factor
function to modify the skill level. (I used Skilllevel
= 3 for OCC1990
= 8 )
library(data.table)
setDT(df)
levels <- c(3:7,8) # unique values of OCC1990
labels <- c(rep(1,5), 3) # new Skill levels corresponding to OCC1990
setkey(df, OCC1990) # sort OCC1990 for speed before filtering
df[ OCC1990 %in% levels, Skilllevel := as.integer(as.character(factor(OCC1990, levels = levels, labels = labels)))]
head(df)
# OCC1990 Skilllevel
#1: 3 1
#2: 8 3
#3: 12 2
#4: 14 3
#5: 15 1
If you are still facing memory size issues, read in chunks of data from IO (use fread
) and apply the above operation and then append data to a new file.
Data:
df <- read.table(text='OCC1990 Skilllevel
3 1
8 2
12 2
14 3
15 1 ', header=TRUE)
Upvotes: 1