Alex_P
Alex_P

Reputation: 11

Does a growth rate variable, calculated with the same interpolated variable, create any problem with panel data in R?

Very thankful in advance.

I have a panel data in R of some non consecutive years (1821,1831,1832,1833,1837:1875) and population (pop) just for some of those years. I interpolated those missing values with "na_interpolation" function, such as:

Panel$pop_interp <- na_interpolation(Panel$pop)

The panel data is ordered by years. My individuals are "counties".

Panel <- Panel %>%
  group_by(county, year) %>%
  arrange(year)

With a nice column of "pop_interp" (interpolated values for population) I tried to compute a population growth rate with the following code:

Panel <- Panel %>% 
  mutate(growth_rate =(pop_interp / lag(pop_interp) - 1)*100)

And here the problem: the column of the growth rate is zero for all values. My question is, what am I doing wrong?

This is my first question on stack overflow, so please be merciful. I will provide everything I can to get some help from you :) Thanks again in advance. Best, Alejandra.

Here is how the panel data looks like at the end. Panel data

Explained above.

With the zeros at the growth rate column

Upvotes: 1

Views: 88

Answers (1)

Isaiah
Isaiah

Reputation: 2155

So you can get going with reproducible examples, a few hints...

It's handy to include the packages you use in a reprex:

library(tibble)
library(dplyr)

It's usually essential to include a data set:

year = c(1837:1875)
Panel <- tibble(pop = c(1:length(year)),
                year = year)

And the simplest code that shows the problem:

Panel <- Panel |>
  group_by(year) |> 
  mutate(growth_rate = lag(pop))

This shows what Stefan predicts: a column of NAs. It would be easy to see NA and take it as being zero. In fact, NA means a value is not available, whereas zero is a value.

What has happened here? Let's simplify the code a little more:

Panel <- ungroup(Panel) # To undo the grouping, which is easy to forget.
Panel <- Panel |> 
  mutate(growth_rate = lag(pop) * 100) 

Now we have values in growth_rate. In the first version, as we used group_by(year), the mutate function is applied to each group. However, as there is only one year in each group, when lag looks for the previous year, it finds nothing, so returns NA. In the code without group, all but the first year has a previous value, and so returns a value. The first year has no previous value, so returns NA.

You might want to try removing the group from your code.

Hope this helps!

Upvotes: 0

Related Questions