Reputation: 1
I am using Stata to compare the percentage of time spent on 7 different activities (e.g. work, leisure) for 8 different timeslots of the day (3-hours each, ranging from 00:00-03:00 to 21.00-00:00). I would like to make a “stacked bar graph”: where the bars show the relative size of the activities on each timeslot (so in percentages, adding up to hundred for each bar).
Below and attached is the structure of my data for three example respondents (in real life there are 363 respondents):
[variables]
respnummer domain_nr domain_type
time_00_03 time_03_06 time_06_09 time_09_12 time_12_15 time_15_18 time_18_21 time_21_00
[data]
1 1 Home 1 0 0 0 0 0 0 0
1 2 Work 0 0 0 0 1 0 0 0
1 3 Sports 0 0 0 0 0 0 1 0
1 4 Shopping 0 0 0 1 0 0 0 0
1 5 Going out 0 0 0 0 0 0 1 0
1 6 Other 0 0 0 0 0 1 0 0
2 1 Home 1 1 1 1 0 0 0 1
2 2 Education 0 0 0 1 1 1 0 0
2 3 Shopping 0 0 0 0 0 1 1 0
2 4 Going out 0 0 0 0 0 0 0 0
2 5 Other 1 1 1 1 1 1 1 1
3 1 Home 1 1 1 0 0 1 1 1
3 2 Sports 0 0 0 0 0 0 0 0
3 3 Shopping 0 0 0 1 1 0 0 0
3 4 Other 0 0 0 0 0 0 0 0
So far, I only got the following code working in Stata:
graph bar time_*, over(domain_type) stack
// works, but other way around and not as percentages added up to 100%
But this is the other way around of what I would like to get: I want the 8 different timeslots on the X-axis and for each timeslot a stacked bar of the 7 different activities (in percentages). When I tried to reorder the variables (see below), Stata gives me the following error: “too many variables specified”.
graph bar domain_type, over(time_*) stack
// error: too many variables specified
Also, the percentages from the first 'wrong' graph do not add up to hundred percent. So in addition, the activity counts need to be standardized as the percentage of the total counts for each time interval. For example, for the first timeslot (00:00-03:00), 75% of the activities (3 of the 4) should be classified as “home” (large bar of 3/4th) and 25% of the activities (1 of the 4) is classified as “other activity” (small bar of 1/4th).
How can I create this bar graph correctly?
Upvotes: 0
Views: 925
Reputation: 37338
See the concurrent thread Formatting catplot - stata which raises similar challenges. Here I use catplot
from SSC, which is just a wrapper for graph bar
here, but I am more familiar with its syntax for percent calculations.
Your variable domain_type
looks like a numeric variable with value labels, but you don't give the label mapping.
clear
input respnummer domain_nr str9 Domain_Type time_00_03 time_03_06 time_06_09 time_09_12 time_12_15 time_15_18 time_18_21 time_21_00
1 1 Home 1 0 0 0 0 0 0 0
1 2 Work 0 0 0 0 1 0 0 0
1 3 Sports 0 0 0 0 0 0 1 0
1 4 Shopping 0 0 0 1 0 0 0 0
1 5 "Going out" 0 0 0 0 0 0 1 0
1 6 Other 0 0 0 0 0 1 0 0
2 1 Home 1 1 1 1 0 0 0 1
2 2 Education 0 0 0 1 1 1 0 0
2 3 Shopping 0 0 0 0 0 1 1 0
2 4 "Going out" 0 0 0 0 0 0 0 0
2 5 Other 1 1 1 1 1 1 1 1
3 1 Home 1 1 1 0 0 1 1 1
3 2 Sports 0 0 0 0 0 0 0 0
3 3 Shopping 0 0 0 1 1 0 0 0
3 4 Other 0 0 0 0 0 0 0 0
end
reshape long time_ , i(respnummer domain_nr) j(when) string
replace when = subinstr(when, "_", "-", .)
* install first with code: ssc install catplot
catplot Domain_Type when [fw=time_], percent(when) asyvars stack recast(bar)
See also the results using tabplot
from the Stata Journal, as with
tabplot Domain_Type when [fw=time_], percent(when) separate(Domain_Type) showval xtitle("")
Upvotes: 1