dani
dani

Reputation: 5000

Median of column and then look up other column value? (R)

I have this table:

Profession Educational_level Number
Doctor     Low               0
Doctor     Medium            5
Doctor     High              8
Nurse      Low               1
Nurse      Medium            8
Nurse      High              3
[...]

I want to find out the educational level of the median person and end up with this table:

Doctor     High
Nurse      Medium
[...]

Questions:

  1. How do I do this in R?
  2. And in Excel?

Thanks.

Upvotes: 1

Views: 270

Answers (3)

Matthew Plourde
Matthew Plourde

Reputation: 44614

I recommend @PaulHiemstra's answer if relying on additional packages isn't a problem for your workflow. Otherwise, this is probably the easiest way to do it in base R:

df <- read.csv(text="Profession Educational_level Number
Doctor     Low               0
Doctor     Medium            5
Doctor     High              8
Nurse      Low               1
Nurse      Medium            8
Nurse      High              3", header=TRUE)

results <- by(df, INDICES=dat$Profession, 
   FUN=function(subset) with(subset, Educational_level[which.max(Number)]))
data.frame(names(results), unclass(results))

Upvotes: 2

barry houdini
barry houdini

Reputation: 46331

In Excel you could use an "array formula" like this

=INDEX(B2:B10,MATCH(1,(A2:A10="Doctor")*(C2:C10=MEDIAN(IF(A2:A10="Doctor",C2:C10))),0))

confirmed with CTRL+SHIFT+ENTER

That works for your examples but for a genuine median, if there were 4 doctors, for instance, then the median value for doctors is halfway between the 2nd and 3rd, therefore possibly not any value in the list, so how will you know which educational level to assign?

Upvotes: 1

Paul Hiemstra
Paul Hiemstra

Reputation: 60924

I don't think you are looking for the median educational level per profession, but the mode, i.e. the category with the highest frequency. To get this you can use ddply from the plyr package:

require(plyr)
ddply(df, .(Profession), summarise, 
          mode_educ = Educational_level[which.max(Number)]

where df is the data.frame which contains your data.

Upvotes: 1

Related Questions