mats
mats

Reputation: 133

How to index outliers?

I have the data below. How can I determine which author has the highest number of publications?

I try this

   (which(status$researchers==max(status$publications)) 

but it doesn't seem to work.

#PUBLICATIONS

researchers = c("Smith", "Johnson", "Williams", "Brown", "Jones", "Miller", "Davis", "García", "Rodriguez", "Wilson", "Martinez", "Anderson", "Taylor", "Thomas", "Hernandez", "Moore", "Martin", "Jackson", "Thompson", "White", "Lopez", "Lee", "Gonzalez", "Harris", "Clark", "Lewis", "Robinson", "Walker", "Perez", "Hall", "Young", "Allen", "Sanchez", "Wright", "King", "Scott", "Green", "Baker", "Adams", "Nelson", "Hill", "Ramirez", "Campbell", "Mitchell", "Roberts", "Carter", "Phillips", "Evans", "Turner", "Stapel", "Torres", "Parker", "Collins", "Edwards", "Stewart", "Flores", "Morris", "Nguyen", "Murphy", "Rivera", "Cook", "Rogers", "Morgan", "Peterson", "Cooper", "Reed", "Bailey", "Bell", "Gomez", "Kelly", "Howard", "Ward", "Cox", "Diaz", "Richardson", "Wood", "Watson", "Brooks", "Bennett", "Gray", "James", "Reyes", "Cruz", "Hughes", "Price", "Myers", "Long", "Foster ", "Sanders", "Ross", "Morales", "Powell", "Sullivan", "Russell", "Ortiz", "Jenkins", "Gutierrez", "Perry", "Butler", "Barnes", "Fisher", "De Jong", "Jansen", "De Vries", "vd Berg", "Van Dijk", "Bakker", "Janssen", "Visser", "Smit", "Meijer", "De Boer", "Mulder", "De Groot", "Bos", "Smeesters", "Vos", "Peters", "Hendriks", "Van Leeuwen", "Dekker", "Brouwer", "De Wit", "Dijkstra", "Smits", "De Graaf", "Van der Meer", "Muller", "Schmidt", "Schneider", "Fischer", "Meyer", "Weber", "Schulz", "Wagner", "Becker", "Hoffmann", "Wagemakers",  "Molenaar", "Jansen", "White", "Bargh", "Dijksterhuis", "Poldermans", "Kanazawa", "Lynne", "Ling", "Vorst", "Borsboom", "Wicherts")

articles = data.frame(cbind(researchers, publications))
write.table(articles, file = "scientific status.txt", sep = " ")

status = read.table("scientific status.txt", header = TRUE, sep = "", quote = "\"'")     

Upvotes: 0

Views: 1014

Answers (2)

agstudy
agstudy

Reputation: 121568

It is not a general response but here you need just to extract duplicated.

researchers[duplicated(researchers)]
[1] "Jansen" "White"  ## this 2 authors have 1 publications more than others!

To see the ouliers you can do this for example :

plot(table(researchers))

enter image description here

Upvotes: 2

flodel
flodel

Reputation: 89057

It is not clear what your data represents. If it is already aggregated per author, i.e., there is one row per author and the publications column contains the number of publications, do:

status$researchers[which.max(status$publications)]

If instead, your data is not aggregated, i.e., there is one per article, you can do:

tail(sort(table(status$researchers)), 1)

Upvotes: 2

Related Questions