qiang
qiang

Reputation: 1

Counting values to get a matrix in Stata

I have a variable age, 13 variables x1 to x13, and 802 observations in a Stata dataset. age has values ranging 1 to 9. x1 to x13 have values ranging 1 to 13.

I want to know how to count the number of 1 .. 13 in x1 to x13 according to different values of age. For example, for age 1, in x1 to x13, count the number of 1,2,3,4,...13.

I first change x1 to x13 as a matrix by using

mkmat x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13, matrix (a)

Then, I want to count using the following loop:

gen count = 0 
quietly forval i = 1/802 {
    quietly forval j = 1/13 { 
       replace count = count + inrange(a[r'i', x'j'], 0, 1), if age==1
    }
}

I failed.

Upvotes: 0

Views: 590

Answers (2)

Nick Cox
Nick Cox

Reputation: 37233

With Aspen's example, you could do this:

gen id = _n 
reshape long x, i(id) 
tab age x

Note that your sample code doesn't loop over different ages and there is an incorrect comma in the count command. I won't try to fix the code, as there are many more direct methods, one of which is above. tabulate has an option to save the table as a matrix.

Here is another solution closer to the original idea. Warning: code not tested.

matrix count = J(9, 13, 0) 

forval i = 1/9 { 
    forval j = 1/13 { 
        forval J = 1/13 { 
            qui count if age == `i' & x`J' == `j'   
            matrix count[`i', `j'] = count[`i', `j'] + r(N) 
        }
    }
}

Upvotes: 1

Aspen Chen
Aspen Chen

Reputation: 735

I am still somewhat uncertain as to what you like to achieve. But if I am understanding you correctly, here is one way to do it.

First, a simple data that has age ranging from one to three, and four variables x1-x4, each with values of integers ranging between 5 and 7.

clear
input age x1 x2 x3 x4
1 5 6 6 6
1 7 5 6 5
2 5 7 6 6
3 5 6 7 7
3 7 6 6 6
end

Then we create three count variables (n5, n6 and n7) that counts the number of 5s, 6s, and 7s for each subject across x1-x4.

forval i=5/7    {
    egen n`i'=anycount(x1 x2 x3 x4),v(`i')
}

Below is how the data looks like now. To explain, the first "1" under n5 indicates that there is only one "5" for the subject across x1-x4.

     +----------------------------------------+
     | age   x1   x2   x3   x4   n5   n6   n7 |
     |----------------------------------------|
  1. |   1    5    6    6    6    1    3    0 |
  2. |   1    7    5    6    5    2    1    1 |
  3. |   2    5    7    6    6    1    2    1 |
  4. |   3    5    6    7    7    1    1    2 |
  5. |   3    7    6    6    6    0    3    1 |
     +----------------------------------------+

It sounds to me like your ultimate goal is to have sums calculated separately for each value in age. Assuming this is true, let's create a 3x3 matrix to store such results.

mat A=J(3,3,.) // age (1-3) and values (5-7)
mat rown A=age1 age2 age3
mat coln A=value5 value6 value7

forval i=5/7    {
    forval j=1/3    {
        qui su n`i' if age==`j'
        loca k=`i'-4 // the first column for value5
        mat A[`j',`k']=r(sum)
    }
}

The matrix looks like this. To explain, the first "3" under value5 indicates that for all children of the age of 1, the value 5 appears a total of three times across x1-x4

A[3,3]
      value5  value6  value7
age1       3       4       1
age2       1       2       1
age3       1       4       3

Upvotes: 1

Related Questions