Reputation: 5000

Median of column and then look up other column value? (R)

I have this table:

Profession Educational_level Number
Doctor     Low               0
Doctor     Medium            5
Doctor     High              8
Nurse      Low               1
Nurse      Medium            8
Nurse      High              3
[...]

I want to find out the educational level of the median person and end up with this table:

Doctor     High
Nurse      Medium
[...]

Questions:

How do I do this in R?
And in Excel?

Thanks.

Upvotes: 1

Answers (3)

Matthew Plourde

Reputation: 44614

I recommend @PaulHiemstra's answer if relying on additional packages isn't a problem for your workflow. Otherwise, this is probably the easiest way to do it in base R:

df <- read.csv(text="Profession Educational_level Number
Doctor     Low               0
Doctor     Medium            5
Doctor     High              8
Nurse      Low               1
Nurse      Medium            8
Nurse      High              3", header=TRUE)

results <- by(df, INDICES=dat$Profession, 
   FUN=function(subset) with(subset, Educational_level[which.max(Number)]))
data.frame(names(results), unclass(results))

Upvotes: 2

barry houdini

Reputation: 46331

In Excel you could use an "array formula" like this

=INDEX(B2:B10,MATCH(1,(A2:A10="Doctor")*(C2:C10=MEDIAN(IF(A2:A10="Doctor",C2:C10))),0))

confirmed with CTRL+SHIFT+ENTER

That works for your examples but for a genuine median, if there were 4 doctors, for instance, then the median value for doctors is halfway between the 2nd and 3rd, therefore possibly not any value in the list, so how will you know which educational level to assign?

Upvotes: 1

Paul Hiemstra

Reputation: 60924

I don't think you are looking for the median educational level per profession, but the mode, i.e. the category with the highest frequency. To get this you can use ddply from the plyr package:

require(plyr)
ddply(df, .(Profession), summarise, 
          mode_educ = Educational_level[which.max(Number)]

where df is the data.frame which contains your data.

Upvotes: 1

Median of column and then look up other column value? (R)

Answers (3)

Related Questions