Reputation: 757
I want to double check something. I'm trying to merge a large dataset into a smaller dataset. My large dataframe has observations that are not included in the small dataset. I cannot obtain a simple merge with my dataframes using the standard code
x<-merge(df1,df2) ###default is all=FALSE.
###output from this code produces a df with 49 rows instead of 13
have also used;
x<-merge(df1,df2, by='noms') ##output produces 49 rows instead of 13
After much reading and checking for people who have alreday asked this question e.g., Merge 2 data frames, discard unmatched rows i came across this https://stat.ethz.ch/pipermail/r-help/2006-September/113148.html which says that there is no unambiguous way to fix this problem.
Is this still the case? apologies if this has been answered somewhere already, I've tried reading the core documents ?merge and posts on stackoverflow - but am now at a loose end.
my dfs are below
Small dataframe
noms fruits apple orange kiwi all_comb comb numbers
1 mary apple 1 0 0 1 1 1
2 mary grape 0 0 0 0 1 2
3 mary orange 0 1 0 0 1 3
4 mary apple 1 0 0 1 1 4
5 john banana 0 0 0 0 1 1
6 john apple 1 0 0 1 1 2
7 john apple 1 0 0 1 1 3
8 john apple 1 0 0 1 1 4
9 lucy kiwi 0 0 1 0 1 1
10 lucy orange 0 1 0 0 1 2
11 lucy apple 1 0 0 1 1 3
12 lucy berry 0 0 0 0 1 4
13 tom orange 0 1 0 0 1 1
Large dataframe
noms age
1 jane 50
2 jane 50
3 jane 50
4 jane 50
5 mary 65
6 mary 65
7 mary 65
8 mary 65
9 john 34
10 john 34
11 john 34
12 john 34
13 pat 65
14 pat 65
15 pat 65
16 lucy 89
17 lucy 89
18 lucy 89
19 lucy 89
20 tom 12
Desired output
df
noms fruits apple orange kiwi all_comb comb numbers age
1 mary apple 1 0 0 1 1 1 65
2 mary grape 0 0 0 0 1 2 65
3 mary orange 0 1 0 0 1 3 65
4 mary apple 1 0 0 1 1 4 65
5 john banana 0 0 0 0 1 1 34
6 john apple 1 0 0 1 1 2 34
7 john apple 1 0 0 1 1 3 34
8 john apple 1 0 0 1 1 4 34
9 lucy kiwi 0 0 1 0 1 1 89
10 lucy orange 0 1 0 0 1 2 89
11 lucy apple 1 0 0 1 1 3 89
12 lucy berry 0 0 0 0 1 4 89
13 tom orange 0 1 0 0 1 1 12
Upvotes: 1
Views: 103
Reputation: 81693
It works if you use the unique rows of df2
:
merge(df1, unique(df2))
noms fruits apple orange kiwi all_comb comb numbers age
1 john banana 0 0 0 0 1 1 34
2 john apple 1 0 0 1 1 2 34
3 john apple 1 0 0 1 1 3 34
4 john apple 1 0 0 1 1 4 34
5 lucy kiwi 0 0 1 0 1 1 89
6 lucy orange 0 1 0 0 1 2 89
7 lucy apple 1 0 0 1 1 3 89
8 lucy berry 0 0 0 0 1 4 89
9 mary apple 1 0 0 1 1 1 65
10 mary grape 0 0 0 0 1 2 65
11 mary orange 0 1 0 0 1 3 65
12 mary apple 1 0 0 1 1 4 65
13 tom orange 0 1 0 0 1 1 12
Upvotes: 3
Reputation: 6365
Is this what you want to do?
df_agg <- aggregate(age ~ noms, df_large, max)
merge(df_agg, df_small, by = "noms")
or if you don't care about the ages,
df_agg <- data.frame(nom = unique(df_large$noms))
merge(df_agg, df_small, by = "noms")
Upvotes: 2