Gertjan V
Gertjan V

Reputation: 11

Repeat a specific value per firm to all years

I have a panel data ranging from year 1917 to 1922 with various variables (for example Leverage) for 200 firms. It looks something like this:

  1. 1917 1 0.1
  2. 1918 1 0.2
  3. 1919 1 0.3
  4. 1917 2 0.4
  5. 1918 2 0.5
  6. 1917 3 0.6
  7. 1918 3 0.7
  8. 1919 3 0.8
  9. 1920 3 0.9 ....

I want to copy the values for 1917 to all other years (1918, 1919,...) for the variables per FirmID. As in my example, not all years are present (so I cannot say that the value is there every X row). The result must be something like:

  1. 1917 1 0.1
  2. 1918 1 0.1
  3. 1919 1 0.1
  4. 1917 2 0.4
  5. 1918 2 0.4
  6. 1917 3 0.6
  7. 1918 3 0.6
  8. 1919 3 0.6
  9. 1920 3 0.6 ....

Upvotes: 0

Views: 31

Answers (2)

Nick Cox
Nick Cox

Reputation: 37278

Borrowing @Cybernike's helpful data example, here are two ways to do it in one line

clear
input year id leverage
1917 1 0.1
1918 1 0.2
1919 1 0.3
1917 2 0.4
1918 2 0.5
1917 3 0.6
1918 3 0.7
1919 3 0.8
1920 3 0.9
end

egen wanted1 = mean(cond(year == 1917, leverage, .)), by(id)

egen wanted2 = mean(leverage / (year == 1917)), by(id)

list, sepby(id)

     +------------------------------------------+
     | year   id   leverage   wanted1   wanted2 |
     |------------------------------------------|
  1. | 1917    1         .1        .1        .1 |
  2. | 1918    1         .2        .1        .1 |
  3. | 1919    1         .3        .1        .1 |
     |------------------------------------------|
  4. | 1917    2         .4        .4        .4 |
  5. | 1918    2         .5        .4        .4 |
     |------------------------------------------|
  6. | 1917    3         .6        .6        .6 |
  7. | 1918    3         .7        .6        .6 |
  8. | 1919    3         .8        .6        .6 |
  9. | 1920    3         .9        .6        .6 |
     +------------------------------------------+

For detailed discussion of both methods, see Sections 9 and 10 of this paper.

I don't overwrite the original data, contrary to your request. Often you decide later that you need them after all, or someone asks to see them.

This isn't necessarily better than the solution of @Cybernike. The division method behind wanted2 has struck some experienced users as too tricksy, and I tend to recommend the cond() device behind wanted1.

Upvotes: 1

Bicep
Bicep

Reputation: 1103

The following works for me:

clear
input year id leverage
1917 1 0.1
1918 1 0.2
1919 1 0.3
1917 2 0.4
1918 2 0.5
1917 3 0.6
1918 3 0.7
1919 3 0.8
1920 3 0.9
end

gen leverage1917 = leverage if year == 1917

bysort id: egen min = min(leverage1917)
replace leverage = min
drop min leverage1917

. list, sepby(id)

     +----------------------+
     | year   id   leverage |
     |----------------------|
  1. | 1917    1         .1 |
  2. | 1918    1         .1 |
  3. | 1919    1         .1 |
     |----------------------|
  4. | 1917    2         .4 |
  5. | 1918    2         .4 |
     |----------------------|
  6. | 1917    3         .6 |
  7. | 1918    3         .6 |
  8. | 1919    3         .6 |
  9. | 1920    3         .6 |
     +----------------------+

EDIT NJC

This could be simplified to

generate leverage1917 = leverage if year == 1917 
bysort id (leverage1917) : replace leverage1917 = leverage1917[1] 

thus cutting out the egen call and the generation of another variable you then need to drop. This works properly even if there is no value for 1917 for some values of id.

Upvotes: 1

Related Questions