Reputation: 757
i am trying to merge two dataframes in R. i would like to add df2 to df, so that the length of df is maintained, that is any unmatched rows are dropped. I need the smallest negative integer belonging to each person to always be matched to 'month1', the second smallest integer to be matched to month2, the third smallest integer matched to month 3 and so on.
df
names variable
1 jane -4
2 jane -3
3 jane -2
4 john -5
5 john -4
6 john -3
7 john -2
8 john -1
9 john 1
10 john 2
11 mary -3
12 mary -2
13 mary -1
14 mary 1
15 mary 2
16 mary 3
17 mary 4
18 tom -6
19 tom -5
20 tom -4
21 tom -3
22 tom -2
23 tom -1
24 tom 1
df2
noms months
1 jane month1
2 jane month2
3 jane month3
4 jane month4
5 jane month5
6 jane month6
7 jane month7
8 jane month8
9 jane month9
10 jane month10
11 john month1
12 john month2
13 john month3
14 john month4
15 john month5
16 john month6
17 john month7
18 john month8
19 john month9
20 john month10
21 mary month1
22 mary month2
23 mary month3
24 mary month4
25 mary month5
26 mary month6
27 mary month7
28 mary month8
29 mary month9
30 mary month10
31 tom month1
32 tom month2
33 tom month3
34 tom month4
35 tom month5
36 tom month6
37 tom month7
38 tom month8
39 tom month9
40 tom month10
DESIRED OUTPUT
names variable months
1 jane -4 month1
2 jane -3 month2
3 jane -2 month3
4 john -5 month1
5 john -4 month2
6 john -3 month3
7 john -2 month4
8 john -1 month5
9 john 1 month6
10 john 2 month7
11 mary -3 month1
12 mary -2 month2
13 mary -1 month3
14 mary 1 month4
15 mary 2 month5
16 mary 3 month6
17 mary 4 month7
18 tom -6 month1
19 tom -5 month2
20 tom -4 month3
21 tom -3 month4
22 tom -2 month5
23 tom -1 month6
24 tom 1 month7
This seems like it should be a simple merge, but the code isn't working for me. This is what I have tried
final <- merge(df, df2, by.x = "names", by.y = "noms", sort=F,all.x=T,all.y=F)
have also tried this
x<-df2$names %in% df$noms
y<-cbind(df2, x)
matches<-y[y$x!=FALSE,]
I'm sure this is a fundamental problem, but my simple merge code won't work as it should. Thank you in advance for any help.
Upvotes: 2
Views: 1624
Reputation: 59335
An approach using data tables:
library(data.table)
dt <- data.table(df)
dt[,months:=paste0("month",row(dt)[,1]),by=names]
head(dt,10)
# names variable months
# 1: jane -4 month1
# 2: jane -3 month2
# 3: jane -2 month3
# 4: john -5 month1
# 5: john -4 month2
# 6: john -3 month3
# 7: john -2 month4
# 8: john -1 month5
# 9: john 1 month6
# 10: john 2 month7
Upvotes: 3
Reputation: 81683
Here's another approach:
tab <- table(df$names) # count rows per name
# create vector with months
tmp <- unlist(lapply(names(tab), function(x) {
head(df2$months[as.character(df2$noms) == x], tab[x])
}))
# create new data frame
final <- cbind(df, months = tmp)
The result:
names variable months
1 jane -4 month1
2 jane -3 month2
3 jane -2 month3
4 john -5 month1
5 john -4 month2
6 john -3 month3
7 john -2 month4
8 john -1 month5
9 john 1 month6
10 john 2 month7
11 mary -3 month1
12 mary -2 month2
13 mary -1 month3
14 mary 1 month4
15 mary 2 month5
16 mary 3 month6
17 mary 4 month7
18 tom -6 month1
19 tom -5 month2
20 tom -4 month3
21 tom -3 month4
22 tom -2 month5
23 tom -1 month6
24 tom 1 month7
Upvotes: 4