Laura
Laura

Reputation: 31

Count/plot changes over time for a variable in panel data with R

I'm relatively new to R and got stuck with an actually simple thing. I have panel data and I want to plot or count how many observation units experience a change in a specific variable over time. The data looks the following way:

statename,from,to,id,x
United States,1946,1965,201000,
United States,1946,1965,202000,
United States,1946,1965,203000,false
United States,1970,1973,203000,true
United States,1946,1965,204000,
United States,1946,1965,205000,true

So, to be precise, I want to get a number of how many ids experienced a change in x and/or how many always had true for x and how many always had false for x.

I tried to make a dummy that equals 1 if x was true at least once and false at least once, i.e., if there was a change. However, it did not work. I also tried to approach the problem with the table(), aggregate(), group_by() and count() functions (partly combined) but I just do not get what I want.

Can anybody possibly help?

Upvotes: 2

Views: 437

Answers (1)

Werner Hertzog
Werner Hertzog

Reputation: 2022

To make it reproducible:

my_data <- read.csv(text=
"statename,from,to,id,x
United States,1946,1965,201000,
United States,1946,1965,202000,
United States,1946,1965,203000,false
United States,1970,1973,203000,true
United States,1946,1965,204000,
United States,1946,1965,205000,true", header=TRUE)

Here is a solution using package dplyr and functions mutate() and lag():

library(dplyr)
my_data <- my_data %>%
  group_by(id) %>%
  mutate(xChanged = case_when(x != lag(x) ~ "Yes", TRUE ~ "No")) %>%
  as.data.frame()

Step-by-step, here is what the code above does:

  1. It groups your data by id.
  2. It then uses the lag() function to look up the previous value of x.
  3. If x is different from the previous x, it inputs "Yes" to column xChanged .
  4. Otherwise, it inputs "No" to column xChanged

The output will look like this:

> my_data
      statename from   to     id     x xChanged
1 United States 1946 1965 201000             No
2 United States 1946 1965 202000             No
3 United States 1946 1965 203000 false       No
4 United States 1970 1973 203000  true      Yes
5 United States 1946 1965 204000             No
6 United States 1946 1965 205000  true       No

Now you can count how many "Yes" there are in xChanged.

nrow(my_data[my_data$xChanged == "Yes",])

Result: 1.

Upvotes: 1

Related Questions