Alex Raaijmakers
Alex Raaijmakers

Reputation: 3

merge command not matching variables for `matched` variable

I have two datasets.

One containing most of my data:

        ISIN    Name     Year      TotalAssets   Turnover   RoA  Employees

    1.  Nr1     Bank1    2005      30.000         1        0,5    300    
    2.  Nr1     Bank1    2006      31.000         1        0,3    300    
    3.  Nr1     Bank1    2007      28.000         1        0,3    280

    4.  Nr2     Bank2    2005      150.000        0        0,8    2000
    5.  Nr2     Bank2    2006      -              0        -      2200
    6.  Nr2     Bank2    2007      -              0        -      2200

    7.  Nr3     Bank3    2005      20.000         1        0,7    275

And another containing some missing information from the first dataset:

    ISIN   Name    Year   TotalAssets   RoA
1.  Nr2    Bank2   2005    150.000      0,8
2.  Nr2    Bank2   2006    180.000      1,3
3.  Nr2    Bank2   2007    170.000      0,7
4.  Nr2    Bank2   2008    150.000      0,4

When I combine these datasets using:

merge 1:1 ISIN Year using "Dataset2" 

for some reason, only the master only (1) and using only (2) match the variables TotalAssets and RoA, while matched(3) gives missing data for every case on the TotalAssets and RoA variable.

The merged dataset looks something like this:

        ISIN    Name     Year      TotalAssets   Turnover   RoA  Employees   _merge

    1.  Nr1     Bank1    2005      30.000         1        0,5    300    master only (1)  
    2.  Nr1     Bank1    2006      31.000         1        0,3    300    master only (1)    
    3.  Nr1     Bank1    2007      28.000         1        0,3    280    master only (1)  

    4.  Nr2     Bank2    2005      150.000        0        0,8    2000   master only (1)  
    5.  Nr2     Bank2    2006      -              0        -      2200   matched(3)
    6.  Nr2     Bank2    2007      -              0        -      2200   matched(3)
    7.  Nr2     Bank2    2008      150.000        -        0,4    -      using only (2)

    8.  Nr3     Bank3    2005      20.000         1        0,7    275    master only (1)  

However the matched don't show any data for the variables on which it's supposed to merge.

Am I doing something wrong or how would I go about doing it in a better way?

Upvotes: 0

Views: 1594

Answers (2)

user8682794
user8682794

Reputation:

What you need in this particular case is the update option of merge, not replace.

Using your example:

clear

input str10(ISIN) str10(Name) Year TotalAssets Turnover RoA Employees
Nr1 Bank1 2005 30000  1 0.5 300    
Nr1 Bank1 2006 31000  1 0.3 300    
Nr1 Bank1 2007 28000  1 0.3 280
Nr2 Bank2 2005 150000 0 0.8 2000
Nr2 Bank2 2006 .      0 .   2200
Nr2 Bank2 2007 .      0 .   2200
Nr3 Bank3 2005 20000  1 0.7 275
end

save data1, replace

clear

input str10(ISIN) str10(Name) Year TotalAssets RoA
Nr2 Bank2 2005 150000 0.8
Nr2 Bank2 2006 180000 1.3
Nr2 Bank2 2007 170000 0.7
Nr2 Bank2 2008 150000 0.4
end

save data2, replace

use data1,clear
merge 1:1 ISIN Year using data2, update

Which gives you:

list

   +----------------------------------------------------------------------------------+
   | ISIN    Name   Year   TotalA~s   Turnover   RoA   Employ~s                _merge |
   |----------------------------------------------------------------------------------|
1. |  Nr1   Bank1   2005      30000          1    .5        300       master only (1) |
2. |  Nr1   Bank1   2006      31000          1    .3        300       master only (1) |
3. |  Nr1   Bank1   2007      28000          1    .3        280       master only (1) |
4. |  Nr2   Bank2   2005     150000          0    .8       2000           matched (3) |
5. |  Nr2   Bank2   2006     180000          0   1.3       2200   missing updated (4) |
   |----------------------------------------------------------------------------------|
6. |  Nr2   Bank2   2007     170000          0    .7       2200   missing updated (4) |
7. |  Nr2   Bank2   2008     150000          .    .4          .        using only (2) |
8. |  Nr3   Bank3   2005      20000          1    .7        275       master only (1) |
   +----------------------------------------------------------------------------------+

There are two missing values in observation 7 because the variables Turnover and Employees are not present in the second dataset.

Upvotes: 1

user9367317
user9367317

Reputation:

Look into the options for help merge, particularly update and replace. In your specific case, it looks like the update option would be sufficient (_merge==4). Stata will view the master dataset as inviolable unless you explicitly give it instructions on how to deal with same-named variables in the using dataset.

Upvotes: 1

Related Questions