Reputation: 15
I am trying to combine 2 data frames via a column known as username. One data frame contains 12 variables with 1619 rows of observations. The other contains 37 columns with 1603 observations. I'd like to match the usernames from each data set, but keep all data. I have tried a merge, but I always get NA for the Y set of data (unless the colname is in both sets of data). Is there a way to append one set of data to another via a column name such as "username?"
Example below:
DataFrame 1
Username HighschoolGPA Age Applydate
Smith, John 3.1 18 03-12-2012
DataFrame 2
Username LiveOnCampus Major StudentGroup_Academic
Smith, John Yes Chemistry No
Final DataFrame
Username HighschoolGPA Age Applydate LiveOnCampus Major StudentGroup_Academic
Smith, John 3.1 18 03-12-2012 Yes Chemistry No
Upvotes: 0
Views: 372
Reputation: 2962
You usually get NA for the Y set of the data when the merge function is matching multiple columns and generating to many unique combinations.
Make sure the username columns are the same type, make sure they aren't factors, and specify more arguments to the merge function.
Try merge(df1, df2, by = "username", all.x = TRUE, all.y = TRUE)
if you would like to keep all results, matched and unmatched.
Try merge(df1, df2, by = "username", all.x = FALSE, all.y = FALSE)
if you want to keep only entries that have a matched username.
Hope this helps!
Upvotes: 0
Reputation: 35314
df1 <- data.frame(Username='Smith, John',HighschoolGPA=3.1,Age=18,Applydate='03-12-2012',stringsAsFactors=F);
df2 <- data.frame(Username='Smith, John',LiveOnCampus='Yes',Major='Chemistry',StudentGroup_Academic='No',stringsAsFactors=F);
merge(df1,df2,'Username');
## Username HighschoolGPA Age Applydate LiveOnCampus Major StudentGroup_Academic
## 1 Smith, John 3.1 18 03-12-2012 Yes Chemistry No
Upvotes: 1