William Liu
William Liu

Reputation: 339

Match name and change data set to wide format in r

I have two df like this:

Name1            Task   Bond    Goal    Name2          Rapport  Client
Abigail Adkins   24     20       25     Adam Tharkur    4.08    4.29
Abigail Brusse   25     26       24     Adam Tharkur    6.08    6
Adam Tharker     24     24       24     Anne Corinne    5.5     6.29
Adriel Parks     27     18       26     Alan Hamaoui    7       6.43
Allison Mofsky   21     28       20     Alyssa Sorrells 5.67    5.29

df1 has Name1, Task, Bond and Goal, df2 has Name2, Rapport and Client. My goal is to select the Names that are the same in two dfs, and have their five scores like this:

Name   Task     Bond     Goal    Rapport1   Client1  Rapport2   Client2
Adam    24      24       24      4.08       4.29     6.08       6

Each name might have multiple observations (less than 10) like "Adam", and I only want to match the first name in selection, because some I have not entered the last name. Do someone have any ideas? Thank you a lot.

Reproducible data:

dat <- read.csv(text="Name1,Task,Bond,Goal,Name2,Rapport,Client
Abigail Adkins,24,20,25,Adam Tharkur,4.08,4.29
Abigail Brusse,25,26,24,Adam Tharkur,6.08,6
Adam Tharker,24,24,24,Anne Corinne,5.5,6.29
Adriel Parks,27,18,26,Alan Hamaoui,7,6.43
Allison Mofsky,21,28,20,Alyssa Sorrells,5.67,5.29",header=TRUE,stringsAsFactors=FALSE)

df1 <- dat[1:4]
df2 <- dat[5:7]

Upvotes: 2

Views: 89

Answers (1)

akrun
akrun

Reputation: 886938

The 'first names' can be extracted using sub, merge the two dataset by the 'Name' column, create a sequence index ('indx') for the grouping variable ('Name1'), and use reshape to change the format from 'long' to 'wide'.

df1$Name1 <- sub(' .*$', '', df1$Name1)
df2$Name2 <- sub(' .*$', '', df2$Name2)
dfN <- merge(df1, df2, by.x='Name1', by.y='Name2')
dfN$indx <- with(dfN, ave(seq_along(Name1), Name1, FUN=seq_along))
reshape(dfN, idvar=c('Name1', 'Task', 'Bond', 'Goal'), 
             timevar='indx', direction='wide')
#   Name1 Task Bond Goal Rapport.1 Client.1 Rapport.2 Client.2
#1  Adam   24   24   24      4.08     4.29      6.08        6

Or we can use the devel version of data.table i.e. v1.9.5. Instructions for installing it are here.

Remove the 'last name' in the 'Name' columns of both dataset (as showed above), then convert the 'df1' to 'data.table' (setDT(df1)), and set the key column as 'Name1' (setkey(.., Name1)). Join with 'df2', create a sequence column ('N') for the 'Name1', and use dcast. In the devel version, the dcast can take multiple value.var columns.

library(data.table)#v1.9.5+
dcast(setkey(setDT(df1), Name1)[df2, nomatch=0][, N:=1:.N, Name1],
             ...~N, value.var= c('Rapport', 'Client'))
#    Name1 Task Bond Goal 1_Rapport 2_Rapport 1_Client 2_Client
#1:  Adam   24   24   24      4.08      6.08     4.29        6

Upvotes: 4

Related Questions