Reputation: 49
I have two data frames:
df1
Syllable Duration Pitch
@ 0.08 93
@ 0.05 107
@ 0.13 56
@ 0.07 95
@ 0.07 123
df2
Syllable Duration
@ 0.08
@ 0.05
@ 0.07
@ 0.07
I want to merge them into another data frame:
df3
Syllable Duration Pitch
@ 0.08 93
@ 0.05 107
@ 0.07 95
@ 0.07 123
The problem is that I have repeated Syllable and Duration values. I've tried this code, but it gives me incorrect Pitch:
df3 <- merge(df2, df1[!duplicated(df1$Syllable),], by="Syllable")
df3
Syllable Duration Pitch
@ 0.08 93
@ 0.05 93
@ 0.07 93
@ 0.07 93
Upvotes: 2
Views: 78
Reputation: 12559
With data.table
you can do:
library("data.table")
df1 <- fread(
"Syllable Duration Pitch
@ 0.08 93
@ 0.05 107
@ 0.13 56
@ 0.07 95
@ 0.07 123")
df2 <- fread(
"Syllable Duration
@ 0.08
@ 0.05
@ 0.07
@ 0.07")
merge(df1, unique(df2))
# > merge(df1, unique(df2))
# Syllable Duration Pitch
# 1: @ 0.05 107
# 2: @ 0.07 95
# 3: @ 0.07 123
# 4: @ 0.08 93
or without sorting:
merge(df1, unique(df2), sort=FALSE)
# > merge(df1, unique(df2), sort=FALSE)
# Syllable Duration Pitch
# 1: @ 0.08 93
# 2: @ 0.05 107
# 3: @ 0.07 95
# 4: @ 0.07 123
this last is the same as:
df1[unique(df2), on=c("Syllable", "Duration")]
# > df1[unique(df2), on=c("Syllable", "Duration")]
# Syllable Duration Pitch
# 1: @ 0.08 93
# 2: @ 0.05 107
# 3: @ 0.07 95
# 4: @ 0.07 123
R
:df1 <- read.table(header=TRUE, text=
"Syllable Duration Pitch
@ 0.08 93
@ 0.05 107
@ 0.13 56
@ 0.07 95
@ 0.07 123")
df2 <- read.table(header=TRUE, text=
"Syllable Duration
@ 0.08
@ 0.05
@ 0.07
@ 0.07 ")
merge(df1, unique(df2))
merge(df1, unique(df2), sort=FALSE)
Upvotes: 4
Reputation: 27
#now keeps unique values for Syllable and the Pitch Values
df1 <- df1[order(df1$Syllable),]
df4<-merge(df2,df1)
df5<-df4[!duplicated(df4$Syllable),]
Upvotes: 1
Reputation: 924
I would propose using the dplyr package. If you use it, you can select the columns which you want to join by. When you join, you should use an semi_join
instead of an inner_join
. The difference is that inner_join
keeps all combinations and possibly duplicates rows ("If there are multiple matches between x and y, all combination of the matches are returned.")
semi_join
on the other hand does: "A semi join differs from an inner join because an inner join will return one row of x for each matching row of y, where a semi join will never duplicate rows of x."
For your case, you can use semi_join(df1, df2, by = c("Syllable", "Duration"))
to merge dataframes. The by
vector defines the column names you want to join by.
This gives you what you wanted:
Syllable Duration Pitch
1 @ 0.08 93
2 @ 0.05 107
3 @ 0.07 95
4 @ 0.07 123
Upvotes: 1