jon
jon

Reputation: 11366

loop or function to compare two column values and create new variable in R

I have two big and small dataframes (actually dataset is very very big !). The following just for working.

big  <- data.frame (SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55)

 SN names var
1  1     A  51
2  2     B  52
3  3     C  53
4  4     D  54
5  5     E  55

small <- data.frame (names = c("A", "C", "E"), type = c("New", "Old", "Old") )
  names type
1     A  New
2     C  Old
3     E  Old

Now I need to create and new variable in "big" with the help of "type" variable in small. The names in small and big will match and corresponding type will be stored in column type. If there is no match between the names columns it will be result in new value "unknown". The expected output is as follows:

resultdf <- data.frame(SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55, 
              type = c("New","Unknown", "Old", "Unknown", "Old"))

resultdf 
  SN names var    type
1  1     A  51     New
2  2     B  52 Unknown
3  3     C  53     Old
4  4     D  54 Unknown
5  5     E  55     Old

I know this is simple question for experts but I could not figure it out.

Upvotes: 1

Views: 2785

Answers (2)

IRTFM
IRTFM

Reputation: 263301

big$type <- c(as.character(small$type),"Unknown") [
                                    match(
                                       x=big$names, 
                                       table=small$names, 
                                       nomatch=length(small$type)+1)]

The basic strategy is to convert the factor to character, add an "unknown" value, and then use big$names to look up the correct index for "types" in the 'small' dataframe. Generating indices is a typical use of the match function.

Upvotes: 1

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

First use merge() with the argument all=TRUE to merge the two data.frames, keeping rows of big that found no matching value in the small$names. Then, replace those elements of big$type that didn't find a match (marked by merge() with "NA"s) with the string "Unknown".

Note that because big and small share just one column name in common, that column is by default used to perform the merge. For more control over which columns are used as the basis of the merge, see the function's by, by.x, and by.y arguments.

small <- data.frame (names = c("A", "C", "E"), 
                     type = c("New", "Old", "Old"), stringsAsFactors=FALSE)
big  <- data.frame (SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55,
                    stringsAsFactors=FALSE)

big <- merge(big, small, all=TRUE)
big$type[is.na(big$type)] <- "Unknown"

Upvotes: 2

Related Questions