Reputation: 339
I have two df like this:
Name1 Task Bond Goal Name2 Rapport Client
Abigail Adkins 24 20 25 Adam Tharkur 4.08 4.29
Abigail Brusse 25 26 24 Adam Tharkur 6.08 6
Adam Tharker 24 24 24 Anne Corinne 5.5 6.29
Adriel Parks 27 18 26 Alan Hamaoui 7 6.43
Allison Mofsky 21 28 20 Alyssa Sorrells 5.67 5.29
df1 has Name1, Task, Bond and Goal, df2 has Name2, Rapport and Client. My goal is to select the Names that are the same in two dfs, and have their five scores like this:
Name Task Bond Goal Rapport1 Client1 Rapport2 Client2
Adam 24 24 24 4.08 4.29 6.08 6
Each name might have multiple observations (less than 10) like "Adam", and I only want to match the first name in selection, because some I have not entered the last name. Do someone have any ideas? Thank you a lot.
Reproducible data:
dat <- read.csv(text="Name1,Task,Bond,Goal,Name2,Rapport,Client
Abigail Adkins,24,20,25,Adam Tharkur,4.08,4.29
Abigail Brusse,25,26,24,Adam Tharkur,6.08,6
Adam Tharker,24,24,24,Anne Corinne,5.5,6.29
Adriel Parks,27,18,26,Alan Hamaoui,7,6.43
Allison Mofsky,21,28,20,Alyssa Sorrells,5.67,5.29",header=TRUE,stringsAsFactors=FALSE)
df1 <- dat[1:4]
df2 <- dat[5:7]
Upvotes: 2
Views: 89
Reputation: 886938
The 'first names' can be extracted using sub
, merge
the two dataset by the 'Name' column, create a sequence index ('indx') for the grouping variable ('Name1'), and use reshape
to change the format from 'long' to 'wide'.
df1$Name1 <- sub(' .*$', '', df1$Name1)
df2$Name2 <- sub(' .*$', '', df2$Name2)
dfN <- merge(df1, df2, by.x='Name1', by.y='Name2')
dfN$indx <- with(dfN, ave(seq_along(Name1), Name1, FUN=seq_along))
reshape(dfN, idvar=c('Name1', 'Task', 'Bond', 'Goal'),
timevar='indx', direction='wide')
# Name1 Task Bond Goal Rapport.1 Client.1 Rapport.2 Client.2
#1 Adam 24 24 24 4.08 4.29 6.08 6
Or we can use the devel version of data.table
i.e. v1.9.5. Instructions for installing it are here
.
Remove the 'last name' in the 'Name' columns of both dataset (as showed above), then convert the 'df1' to 'data.table' (setDT(df1)
), and set the key column as 'Name1' (setkey(.., Name1)
). Join with 'df2', create a sequence column ('N') for the 'Name1', and use dcast
. In the devel version, the dcast
can take multiple value.var
columns.
library(data.table)#v1.9.5+
dcast(setkey(setDT(df1), Name1)[df2, nomatch=0][, N:=1:.N, Name1],
...~N, value.var= c('Rapport', 'Client'))
# Name1 Task Bond Goal 1_Rapport 2_Rapport 1_Client 2_Client
#1: Adam 24 24 24 4.08 6.08 4.29 6
Upvotes: 4