freddywit
freddywit

Reputation: 341

Allocating values from one dataframe to another

I have the following dataframe

OCC1990     Skilllevel
 3             1
 8             2
 12            2
 14            3
 15            1 

As illustrated above it contains a long list of occupations assigned to a specific skill level. My actual dataframe is a household survey with millions of rows, including a column which is also named OCC1990. My goal is to implement my assigned skill levels from the above-listed data frame into the household survey.

I applied in the past already the following code for smaller dataframes, which is a pretty manual way

cps_data[cps_data$OCC1990  %in% 3,"skilllevel"] <- 1
cps_data[cps_data$OCC1990  %in% 4:7,"skilllevel"] <- 1
cps_data[cps_data$OCC1990  %in% 8,"skilllevel"] <- 2

But due to the fact that I don't wanna spend hours copying pasting as well as it increases the probability of making mistakes I'm searching for a different, more direct way.

I've already tried to merge both dataframes, but this result in an error related to the size of the vector.

Is there another way than merging just the two dataframes to assign the skill level also to the occupations in the survey?

Many thanks in advance Xx freddy

Upvotes: 1

Views: 50

Answers (1)

Sathish
Sathish

Reputation: 12723

Using data.table for large dataset

create two vectors: levels and labels. The levels contains unique values of OCC1990 and labels contains the new skill levels you want to apply.

Now use levels and labels inside the factor function to modify the skill level. (I used Skilllevel = 3 for OCC1990 = 8 )

library(data.table)
setDT(df)
levels <- c(3:7,8)        # unique values of OCC1990
labels <- c(rep(1,5), 3)  # new Skill levels corresponding to OCC1990
setkey(df, OCC1990)       # sort OCC1990 for speed before filtering
df[ OCC1990 %in% levels, Skilllevel := as.integer(as.character(factor(OCC1990, levels = levels, labels = labels)))]
head(df)
#   OCC1990 Skilllevel
#1:       3          1
#2:       8          3
#3:      12          2
#4:      14          3
#5:      15          1

If you are still facing memory size issues, read in chunks of data from IO (use fread) and apply the above operation and then append data to a new file.

Data:

df <- read.table(text='OCC1990     Skilllevel
 3             1
 8             2
 12            2
 14            3
 15            1 ', header=TRUE)

Upvotes: 1

Related Questions