Reputation: 68
I have a list of counties in each state that received nonattainment status in years 1995-2005.
I want to know how many counties in each state each year that received this status.
If my data is formatted like this,
State1 Country1 YR1 Yr2 Yr3 Yr4...
State1 Country2 YR1 Yr2 Yr3 Yr4
State2 County1 Yr1 Yr2.....
Each year variable could have a 1 or a zero, since a county may gain or lose this status in a time period.
I need to count each year how many counties in each state have nonattainment status (YRx=1), but can't think of how to do it.
Upvotes: 0
Views: 243
Reputation: 30455
I used the following example:
data <- read.table(textConnection("
state county Yr1 Yr2 Yr3 Yr4
state1 county1 1 0 0 1
state1 county2 0 0 0 0
state1 county3 0 1 0 0
state1 county4 0 0 0 0
state1 county5 0 1 0 1
state2 county6 0 0 0 0
state2 county7 0 0 1 0
state2 county8 1 0 0 1
state2 county9 0 0 0 0
state2 county10 0 1 0 0
state3 county11 1 1 1 1
state3 county12 0 0 0 0
state3 county13 0 1 1 0
state3 county14 0 0 0 1
state4 county15 0 0 0 0
state4 county16 1 0 1 0
state4 county17 0 0 0 0
state4 county18 1 1 1 1
"), header = T)
library(reshape)
data2 <- melt(data, id = c("state", "county"))
cast(data2, state ~ variable, fun = sum)
Result:
state Yr1 Yr2 Yr3 Yr4
1 state1 1 2 0 2
2 state2 1 1 1 1
3 state3 1 2 2 2
4 state4 2 1 2 1
Upvotes: 2
Reputation: 9037
Is this data organized as a dataframe? If so, how are the rows defined? If your data were organized this way:
State County Year Attainment
State1 County1 1 1
State1 County1 2 0
State1 County1 3 1
State1 County1 4 1
State1 County2 1 1
State1 County2 2 1
...
Then it would be possible to get the kind of summary data you're looking for with 1 line of code. Hopefully your notation means that your data is organized like this:
State County Yr1 Yr2 Yr3 Yr4
State1 County1 1 0 1 1
State1 County2 1 1 1 1
Use melt()
from the reshape
package to get from this format to the one laid out above.
new.df <- melt(df, id = 1:2)
It'll call the Year variable variable
and the Attainment variable value
. Now, with clever use of the cast
function, also from the reshape
package, you can get the summary you want.
counties <- cast(new.df, State ~ value, fun = length)
head(counties)
However, if your data is organized so that every state, county and year is a column, and it's only 1 row long, I think your best next step would be to reformat the data outside of R so that it looks at least like my second example.
Upvotes: 0