Reputation: 11366
I have two big and small dataframes (actually dataset is very very big !). The following just for working.
big <- data.frame (SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55)
SN names var
1 1 A 51
2 2 B 52
3 3 C 53
4 4 D 54
5 5 E 55
small <- data.frame (names = c("A", "C", "E"), type = c("New", "Old", "Old") )
names type
1 A New
2 C Old
3 E Old
Now I need to create and new variable in "big" with the help of "type" variable in small. The names in small and big will match and corresponding type will be stored in column type. If there is no match between the names columns it will be result in new value "unknown". The expected output is as follows:
resultdf <- data.frame(SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55,
type = c("New","Unknown", "Old", "Unknown", "Old"))
resultdf
SN names var type
1 1 A 51 New
2 2 B 52 Unknown
3 3 C 53 Old
4 4 D 54 Unknown
5 5 E 55 Old
I know this is simple question for experts but I could not figure it out.
Upvotes: 1
Views: 2785
Reputation: 263301
big$type <- c(as.character(small$type),"Unknown") [
match(
x=big$names,
table=small$names,
nomatch=length(small$type)+1)]
The basic strategy is to convert the factor to character, add an "unknown" value, and then use big$names to look up the correct index for "types" in the 'small' dataframe. Generating indices is a typical use of the match function.
Upvotes: 1
Reputation: 162321
First use merge()
with the argument all=TRUE
to merge the two data.frames, keeping rows of big
that found no matching value in the small$names
. Then, replace those elements of big$type
that didn't find a match (marked by merge()
with "NA"s) with the string "Unknown".
Note that because big
and small
share just one column name in common, that column is by default used to perform the merge. For more control over which columns are used as the basis of the merge, see the function's by, by.x, and by.y arguments.
small <- data.frame (names = c("A", "C", "E"),
type = c("New", "Old", "Old"), stringsAsFactors=FALSE)
big <- data.frame (SN = 1:5, names = c("A", "B", "C", "D", "E"), var = 51:55,
stringsAsFactors=FALSE)
big <- merge(big, small, all=TRUE)
big$type[is.na(big$type)] <- "Unknown"
Upvotes: 2