Reputation: 5000
I have this table:
Profession Educational_level Number
Doctor Low 0
Doctor Medium 5
Doctor High 8
Nurse Low 1
Nurse Medium 8
Nurse High 3
[...]
I want to find out the educational level of the median person and end up with this table:
Doctor High
Nurse Medium
[...]
Questions:
Thanks.
Upvotes: 1
Views: 270
Reputation: 44614
I recommend @PaulHiemstra's answer if relying on additional packages isn't a problem for your workflow. Otherwise, this is probably the easiest way to do it in base R:
df <- read.csv(text="Profession Educational_level Number
Doctor Low 0
Doctor Medium 5
Doctor High 8
Nurse Low 1
Nurse Medium 8
Nurse High 3", header=TRUE)
results <- by(df, INDICES=dat$Profession,
FUN=function(subset) with(subset, Educational_level[which.max(Number)]))
data.frame(names(results), unclass(results))
Upvotes: 2
Reputation: 46331
In Excel you could use an "array formula" like this
=INDEX(B2:B10,MATCH(1,(A2:A10="Doctor")*(C2:C10=MEDIAN(IF(A2:A10="Doctor",C2:C10))),0))
confirmed with CTRL+SHIFT+ENTER
That works for your examples but for a genuine median, if there were 4 doctors, for instance, then the median value for doctors is halfway between the 2nd and 3rd, therefore possibly not any value in the list, so how will you know which educational level to assign?
Upvotes: 1
Reputation: 60924
I don't think you are looking for the median educational level per profession, but the mode, i.e. the category with the highest frequency. To get this you can use ddply
from the plyr package:
require(plyr)
ddply(df, .(Profession), summarise,
mode_educ = Educational_level[which.max(Number)]
where df
is the data.frame which contains your data.
Upvotes: 1