Mixalis
Mixalis

Reputation: 542

Merge different sized data frames

Suppose you have 4 or more csv files and I just want to merge them, and print the output!

I opened the files like so:

df1 <- read.csv("file1", sep ='\t')
df2 <- read.csv("file2", sep ='\t')
df3 <- read.csv("file3", sep ='\t')
df4 <- read.csv("file4", sep ='\t')

The catch is that all of the files have different numbers of both rows and columns and no, identical column names. My professor said to just merge them, so I'm expecting the output to be something like this

file1.column11 ... file1.column1N file2.column21 ... file2.column2N ...
value11 ...  value1N    NA  ...  NA
.
.
.
NA  ...  NA    value21  ...  value2N

Can this be done somehow? I've been using merge(), join_all() and other stuff, and I cant get to the bottom of this...

I'm also very new to R.

Upvotes: 1

Views: 2222

Answers (3)

Aytalina Azarova
Aytalina Azarova

Reputation: 11

Do I get it right: you don't have identical column names?

If yes, than you can merge() them side-by-side only if they have one single column, by values of which they will be merged.

For example, you could have column for year, or subject id, etc. Then you write:

file.overall <- merge(file1,file2,by="common_column_name")

Next, you connect the next file:

file.overall2 <- merge(file.overall,file2,by="common_column_name")

Keep dong this sequentially until you have added all the files.

If you want the columns have to different names, just rename the columns beforehand:

names (file1)<- c("file1.column1_name", ...)

On the other hand, if you want to merge files one under the other, then all of your columns must have identical names, and you can use rbind().

Upvotes: 1

drgxfs
drgxfs

Reputation: 1127

If all the data frames have different row names, you can merge them in the following way:

merge(df1, df2, by="row.names", all.x=T, all.y=T)

Otherwise, if you have plyr installed, you can just do the following (it will fill all empty cells with NAs):

library(plyr)
rbind.fill(df1, df2)

Upvotes: 0

Richie Cotton
Richie Cotton

Reputation: 121057

In general, there is no specific way to merge data frame with different content.

You need to work out how you want to merge the datasets. Some things to think about:

  • Do any of the datasets have the same type of thing in any columns (even if the column names are different)?
  • Which bits of the data to you want to keep/discard?
  • Which bits of the data are common to all the datasets?
  • What relationship do other columns have? Is there a one-to-one or one-to-many relationship between any columns?

Upvotes: 0

Related Questions