Enrique
Enrique

Reputation: 862

Compare and merge columns with different lenght adding NA in the empty rows

I tried to find a solution in SOF but I didn't find anything...

I've got a two data frame with this kind of data.

    > df
        1  |UNIMOD:730
        2  |UNIMOD:4
        3  |UNIMOD:214
        4  |UNIMOD:21
        5  |UNIMOD:35
              .
              .
              .
            n+1500

And another one with this:

> df2
            1  |UNIMOD:730
            2  |UNIMOD:4
            3  |UNIMOD:21
            4  |UNIMOD:35
                  .
                  .
                  .
                n+500

What I would like to have is this kind of output, where merge the column, comparing the values and add NA where the value doesn't exist. There are not repeated values.

    > df
        1  |UNIMOD:730 | UNIMOD:730
        2  |UNIMOD:4   | UNIMOD:4
        3  |UNIMOD:214 | NA
        4  |UNIMOD:21  | UNIMOD:21
        5  |UNIMOD:35  | UNIMOD:35
              .            .
              .            .
              .            .            
            n+1500       n+1500   

I have tried with the option merge but this function just merge in one column all my data and if a use this:

left_join(df, df2, c("sequence"="sequence"))

I just obtain the same result.

Here you have a reproducible example:

df <- data.frame(modifications=c("null", "0-UNIMOD:214", "2-UNIMOD:3","12-UNIMOD:24","1-UNIMOD:44","0-UNIMOD:12", "0-UNIMOD:123", "13-UNIMOD:212"))

df2 <- data.frame(modifications=c("null", "0-UNIMOD:24", "2-UNIMOD:3","12-UNIMOD:24","1-UNIMOD:44","0-UNIMOD:12"))

Upvotes: 0

Views: 59

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50668

Is this what you're after (base R only, using ?match)?

# Your data with added two columns
df1 <- cbind.data.frame(modifications=c("null", "0-UNIMOD:214", "2-UNIMOD:3","12-UNIMOD:24","1-UNIMOD:44","0-UNIMOD:12", "0-UNIMOD:123", "13-UNIMOD:212"),
            df1col2 = "something",
            df1col3 = "val1");

df2 <- cbind.data.frame(modifications=c("null", "0-UNIMOD:24", "2-UNIMOD:3","12-UNIMOD:24","1-UNIMOD:44","0-UNIMOD:12"),
            df2col2 = "anotherthing",
            df2col3 = "val2");


# Merge df1 and merge2
df <- cbind.data.frame(df1, df2[match(df1$modifications, df2$modifications), ]);
     modifications   df1col2 df1col3 modifications      df2col2 df2col3
1             null something    val1          null anotherthing    val2
NA    0-UNIMOD:214 something    val1          <NA>         <NA>    <NA>
3       2-UNIMOD:3 something    val1    2-UNIMOD:3 anotherthing    val2
4     12-UNIMOD:24 something    val1  12-UNIMOD:24 anotherthing    val2
5      1-UNIMOD:44 something    val1   1-UNIMOD:44 anotherthing    val2
6      0-UNIMOD:12 something    val1   0-UNIMOD:12 anotherthing    val2
NA.1  0-UNIMOD:123 something    val1          <NA>         <NA>    <NA>
NA.2 13-UNIMOD:212 something    val1          <NA>         <NA>    <NA>

# Or merge and remove the duplicate modifcations column (if necessary)
df <- cbind.data.frame(df1, df2[match(df1$modifications, df2$modifications), -1]);
print(df);
     modifications   df1col2 df1col3      df2col2 df2col3
1             null something    val1 anotherthing    val2
NA    0-UNIMOD:214 something    val1         <NA>    <NA>
3       2-UNIMOD:3 something    val1 anotherthing    val2
4     12-UNIMOD:24 something    val1 anotherthing    val2
5      1-UNIMOD:44 something    val1 anotherthing    val2
6      0-UNIMOD:12 something    val1 anotherthing    val2
NA.1  0-UNIMOD:123 something    val1         <NA>    <NA>
NA.2 13-UNIMOD:212 something    val1         <NA>    <NA>

Upvotes: 1

Related Questions