Alec Max
Alec Max

Reputation: 35

How to merge data with the same country name but different years?

Excel/Stata newbie here!

I am trying to merge two different datasets in Excel (could also use Stata). Here are the two tables I am merging:

First table:

|---------------------|------------------|------------------|
|      Country        |   Coefficient 1  |        Year      |
|---------------------|------------------|------------------|
|      Afghanistan    |         .34      |        1999      |
|---------------------|------------------|------------------|
|      Afghanistan    |         .22      |        2010      |
|---------------------|------------------|------------------|
|      Albania        |         .7       |        1971      |
|---------------------|------------------|------------------|                 

Second Table:

|---------------------|------------------|------------------|
|      Country        |  Coefficient 2   |        Year      |
|---------------------|------------------|------------------|
|      Afghanistan    |         .10      |        1999      |
|---------------------|------------------|------------------|
|      Afghanistan    |         .4       |        2004      |
|---------------------|------------------|------------------|
|      Albania        |         .2       |        1970      |
|---------------------|------------------|------------------|      

So whilst the countries are the same, the coefficients are from different years.

Here is what I would like it to look like:

enter image description here

So the years are merged, and if both coefficients are in the same year then they will appear on the same row. If there is only one coefficient present for a year, then I want the year and the available coefficient to appear, and the cell for the other coefficient can be blank. (It doesn't really matter how the data is sorted).

Does anyone have any idea how to do this? I've tried YLookup, HLookup, and Merge tool, but I can't get anywhere with it!

Would greatly appreciate your thoughts.

Upvotes: 0

Views: 1073

Answers (2)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60174

In Excel, you can combine tables using Power Query:

let
    Source1 = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source1,{{"Country", type text}, {"Coefficient 1", type number}, {"Year", Int64.Type}}),

    Source2 = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Changed Type2" = Table.TransformColumnTypes(Source2,{{"Country", type text}, {"Coefficient 2", type number}, {"Year", Int64.Type}}),

    combTbl = Table.Combine({#"Changed Type",#"Changed Type2"}),
    #"Reordered Columns" = Table.ReorderColumns(combTbl,{"Country", "Year", "Coefficient 1", "Coefficient 2"}),
    #"Grouped Rows" = Table.Group(#"Reordered Columns", {"Year", "Country"}, {{"Grouped", each _, type table [Country=nullable text, Year=nullable number, Coefficient 1=nullable number, Coefficient 2=nullable number]}}),
    #"Sorted Rows" = Table.Sort(#"Grouped Rows",{{"Country", Order.Ascending}, {"Year", Order.Ascending}}),
    #"Reordered Columns1" = Table.ReorderColumns(#"Sorted Rows",{"Country", "Year", "Grouped"}),
    #"Added Custom" = Table.AddColumn(#"Reordered Columns1", "Coefficient", each List.ReplaceValue(
            {List.Max(Table.Column([Grouped],"Coefficient 1")),
             List.Max(Table.Column([Grouped],"Coefficient 2"))},
                null,"",Replacer.ReplaceValue)),
                
    #"Extracted Values" = Table.TransformColumns(#"Added Custom", {"Coefficient", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
    #"Removed Columns" = Table.RemoveColumns(#"Extracted Values",{"Grouped"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Columns", "Coefficient", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Coefficient.1", "Coefficient.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Coefficient.1", type number}, {"Coefficient.2", type number}})
in
    #"Changed Type1"

enter image description here

Upvotes: 0

Bicep
Bicep

Reputation: 1103

You can use the merge command in Stata. Merge on the country and year variables.

clear

input str20 country coefficient1 year
afghanistan 0.34 1999
afghanistan 0.22 2010
albania     0.70 1971
end

tempfile data1
save `data1'

clear

input str20 country coefficient2 year
afghanistan 0.10 1999
afghanistan 0.40 2004
albania     0.20 1970
end

merge 1:1 country year using `data1'

sort country year
order country year coefficient1 coefficient2

list, sepby(country)

     +------------------------------------------------------------+
     |     country   year   coeffi~1   coeffi~2            _merge |
     |------------------------------------------------------------|
  1. | afghanistan   1999        .34         .1       matched (3) |
  2. | afghanistan   2004          .         .4   master only (1) |
  3. | afghanistan   2010        .22          .    using only (2) |
     |------------------------------------------------------------|
  4. |     albania   1970          .         .2   master only (1) |
  5. |     albania   1971         .7          .    using only (2) |
     +------------------------------------------------------------+

Upvotes: 1

Related Questions