Reputation: 11
I am very much new to R, and I am struggling with finding a way to count unique values across two variables, and have the count sorted by two other variables. My data has rows for each person, and it records their state and area (PUMA) of residence, as well as using codes for the countries of birth of each person's mom and dad (as separate variables). What I want to do is have a frequency count for the countries of birth of both parents, for every group of people in the same state and area, so that I can say something like, "In PUMA100 of STATE1, 1200 foreign-born parents were from COUNTRY500"
So what I have looks like this (only with 300,000+ rows):
library(data.table)
Have <- data.table(STATE = c(1,1,1,1,2,2,2,2,3,3,3,3),
PUMA = c(100,100,200,300,100,200,200,200,100,200,300,300),
BPL_MOM = c(500,400,800,400,400,800,600,600,800,500,400,600),
BPL_POP = c(900,500,400,400,800,600,500,900,400,600,800,500))
And what I want is this:
Want <- data.table(STATE = c(1,1,1,2,2,3,3,3),
PUMA = c(100,200,300,100,200,100,200,300),
BPL_400 = c(1,1,2,1,0,1,0,1),
BPL_500 = c(2,0,0,0,1,0,1,1),
BPL_600 = c(0,0,0,0,3,0,1,1),
BPL_700 = c(0,0,0,0,0,0,0,0),
BPL_800 = c(0,1,0,1,1,1,0,1),
BPL_900 = c(1,0,0,0,1,0,0,0))
I have tried all sorts of combinations of count, table(unlist(, and basically all the suggestions on the forum, but I still cant get it. I feel it should be simple and there's just something Im not doing right. Any help?
Upvotes: 1
Views: 63
Reputation: 101099
I guess you need both melt
+ dcast
like below
dcast(
melt(Have, id = c("STATE", "PUMA")),
STATE + PUMA ~ ...,
value.var = "variable"
)
which gives
STATE PUMA 400 500 600 800 900
1: 1 100 1 2 0 0 1
2: 1 200 1 0 0 1 0
3: 1 300 2 0 0 0 0
4: 2 100 1 0 0 1 0
5: 2 200 0 1 3 1 1
6: 3 100 1 0 0 1 0
7: 3 200 0 1 1 0 0
8: 3 300 1 1 1 1 0
Upvotes: 0