Sabine
Sabine

Reputation: 1

How to create a “stacked bar graph” of different activities for several timeslots in Stata?

I am using Stata to compare the percentage of time spent on 7 different activities (e.g. work, leisure) for 8 different timeslots of the day (3-hours each, ranging from 00:00-03:00 to 21.00-00:00). I would like to make a “stacked bar graph”: where the bars show the relative size of the activities on each timeslot (so in percentages, adding up to hundred for each bar).

Below and attached is the structure of my data for three example respondents (in real life there are 363 respondents):

[variables] 

respnummer  domain_nr   domain_type 
time_00_03  time_03_06  time_06_09  time_09_12  time_12_15  time_15_18  time_18_21  time_21_00

[data]

1   1   Home        1   0   0   0   0   0   0   0

1   2   Work        0   0   0   0   1   0   0   0

1   3   Sports      0   0   0   0   0   0   1   0

1   4   Shopping    0   0   0   1   0   0   0   0

1   5   Going out   0   0   0   0   0   0   1   0

1   6   Other       0   0   0   0   0   1   0   0

2   1   Home        1   1   1   1   0   0   0   1

2   2   Education   0   0   0   1   1   1   0   0

2   3   Shopping    0   0   0   0   0   1   1   0

2   4   Going out   0   0   0   0   0   0   0   0

2   5   Other       1   1   1   1   1   1   1   1

3   1   Home        1   1   1   0   0   1   1   1

3   2   Sports      0   0   0   0   0   0   0   0

3   3   Shopping    0   0   0   1   1   0   0   0

3   4   Other       0   0   0   0   0   0   0   0

see image of data here

So far, I only got the following code working in Stata:

graph bar time_*, over(domain_type) stack // works, but other way around and not as percentages added up to 100%

see here the 'wrong' graph

But this is the other way around of what I would like to get: I want the 8 different timeslots on the X-axis and for each timeslot a stacked bar of the 7 different activities (in percentages). When I tried to reorder the variables (see below), Stata gives me the following error: “too many variables specified”.

graph bar domain_type, over(time_*) stack // error: too many variables specified

Also, the percentages from the first 'wrong' graph do not add up to hundred percent. So in addition, the activity counts need to be standardized as the percentage of the total counts for each time interval. For example, for the first timeslot (00:00-03:00), 75% of the activities (3 of the 4) should be classified as “home” (large bar of 3/4th) and 25% of the activities (1 of the 4) is classified as “other activity” (small bar of 1/4th).

How can I create this bar graph correctly?

Upvotes: 0

Views: 925

Answers (1)

Nick Cox
Nick Cox

Reputation: 37338

See the concurrent thread Formatting catplot - stata which raises similar challenges. Here I use catplot from SSC, which is just a wrapper for graph bar here, but I am more familiar with its syntax for percent calculations.

Your variable domain_type looks like a numeric variable with value labels, but you don't give the label mapping.

clear 
input respnummer  domain_nr   str9 Domain_Type time_00_03  time_03_06  time_06_09  time_09_12  time_12_15  time_15_18  time_18_21  time_21_00
1   1   Home        1   0   0   0   0   0   0   0
1   2   Work        0   0   0   0   1   0   0   0
1   3   Sports      0   0   0   0   0   0   1   0
1   4   Shopping    0   0   0   1   0   0   0   0
1   5   "Going out"   0   0   0   0   0   0   1   0
1   6   Other       0   0   0   0   0   1   0   0
2   1   Home        1   1   1   1   0   0   0   1
2   2   Education   0   0   0   1   1   1   0   0
2   3   Shopping    0   0   0   0   0   1   1   0
2   4   "Going out"   0   0   0   0   0   0   0   0
2   5   Other       1   1   1   1   1   1   1   1
3   1   Home        1   1   1   0   0   1   1   1
3   2   Sports      0   0   0   0   0   0   0   0
3   3   Shopping    0   0   0   1   1   0   0   0
3   4   Other       0   0   0   0   0   0   0   0
end 


reshape long time_ , i(respnummer domain_nr) j(when) string 
replace when = subinstr(when, "_", "-", .)

* install first with code:      ssc install catplot 
catplot Domain_Type when [fw=time_], percent(when) asyvars stack recast(bar)

See also the results using tabplot from the Stata Journal, as with

tabplot Domain_Type when [fw=time_], percent(when) separate(Domain_Type) showval xtitle("")

Upvotes: 1

Related Questions