Reputation: 31
I'm relatively new to R and got stuck with an actually simple thing. I have panel data and I want to plot or count how many observation units experience a change in a specific variable over time. The data looks the following way:
statename,from,to,id,x
United States,1946,1965,201000,
United States,1946,1965,202000,
United States,1946,1965,203000,false
United States,1970,1973,203000,true
United States,1946,1965,204000,
United States,1946,1965,205000,true
So, to be precise, I want to get a number of how many ids
experienced a change in x
and/or how many always had true
for x
and how many always had false
for x
.
I tried to make a dummy that equals 1 if x
was true
at least once and false
at least once, i.e., if there was a change. However, it did not work. I also tried to approach the problem with the table()
, aggregate()
, group_by()
and count()
functions (partly combined) but I just do not get what I want.
Can anybody possibly help?
Upvotes: 2
Views: 437
Reputation: 2022
To make it reproducible:
my_data <- read.csv(text=
"statename,from,to,id,x
United States,1946,1965,201000,
United States,1946,1965,202000,
United States,1946,1965,203000,false
United States,1970,1973,203000,true
United States,1946,1965,204000,
United States,1946,1965,205000,true", header=TRUE)
Here is a solution using package dplyr
and functions mutate()
and lag()
:
library(dplyr)
my_data <- my_data %>%
group_by(id) %>%
mutate(xChanged = case_when(x != lag(x) ~ "Yes", TRUE ~ "No")) %>%
as.data.frame()
Step-by-step, here is what the code above does:
id
.lag()
function to look up the previous value of x
.x
is different from the previous x
, it inputs "Yes" to column xChanged
.xChanged
The output will look like this:
> my_data
statename from to id x xChanged
1 United States 1946 1965 201000 No
2 United States 1946 1965 202000 No
3 United States 1946 1965 203000 false No
4 United States 1970 1973 203000 true Yes
5 United States 1946 1965 204000 No
6 United States 1946 1965 205000 true No
Now you can count how many "Yes" there are in xChanged
.
nrow(my_data[my_data$xChanged == "Yes",])
Result: 1.
Upvotes: 1