Reputation: 133

How to index outliers?

I have the data below. How can I determine which author has the highest number of publications?

I try this

   (which(status$researchers==max(status$publications))

but it doesn't seem to work.

#PUBLICATIONS

researchers = c("Smith", "Johnson", "Williams", "Brown", "Jones", "Miller", "Davis", "García", "Rodriguez", "Wilson", "Martinez", "Anderson", "Taylor", "Thomas", "Hernandez", "Moore", "Martin", "Jackson", "Thompson", "White", "Lopez", "Lee", "Gonzalez", "Harris", "Clark", "Lewis", "Robinson", "Walker", "Perez", "Hall", "Young", "Allen", "Sanchez", "Wright", "King", "Scott", "Green", "Baker", "Adams", "Nelson", "Hill", "Ramirez", "Campbell", "Mitchell", "Roberts", "Carter", "Phillips", "Evans", "Turner", "Stapel", "Torres", "Parker", "Collins", "Edwards", "Stewart", "Flores", "Morris", "Nguyen", "Murphy", "Rivera", "Cook", "Rogers", "Morgan", "Peterson", "Cooper", "Reed", "Bailey", "Bell", "Gomez", "Kelly", "Howard", "Ward", "Cox", "Diaz", "Richardson", "Wood", "Watson", "Brooks", "Bennett", "Gray", "James", "Reyes", "Cruz", "Hughes", "Price", "Myers", "Long", "Foster ", "Sanders", "Ross", "Morales", "Powell", "Sullivan", "Russell", "Ortiz", "Jenkins", "Gutierrez", "Perry", "Butler", "Barnes", "Fisher", "De Jong", "Jansen", "De Vries", "vd Berg", "Van Dijk", "Bakker", "Janssen", "Visser", "Smit", "Meijer", "De Boer", "Mulder", "De Groot", "Bos", "Smeesters", "Vos", "Peters", "Hendriks", "Van Leeuwen", "Dekker", "Brouwer", "De Wit", "Dijkstra", "Smits", "De Graaf", "Van der Meer", "Muller", "Schmidt", "Schneider", "Fischer", "Meyer", "Weber", "Schulz", "Wagner", "Becker", "Hoffmann", "Wagemakers",  "Molenaar", "Jansen", "White", "Bargh", "Dijksterhuis", "Poldermans", "Kanazawa", "Lynne", "Ling", "Vorst", "Borsboom", "Wicherts")

articles = data.frame(cbind(researchers, publications))
write.table(articles, file = "scientific status.txt", sep = " ")

status = read.table("scientific status.txt", header = TRUE, sep = "", quote = "\"'")

Upvotes: 0

Answers (2)

agstudy

Reputation: 121608

It is not a general response but here you need just to extract duplicated.

researchers[duplicated(researchers)]
[1] "Jansen" "White"  ## this 2 authors have 1 publications more than others!

To see the ouliers you can do this for example :

plot(table(researchers))

enter image description here

Upvotes: 2

flodel

Reputation: 89097

It is not clear what your data represents. If it is already aggregated per author, i.e., there is one row per author and the publications column contains the number of publications, do:

status$researchers[which.max(status$publications)]

If instead, your data is not aggregated, i.e., there is one per article, you can do:

tail(sort(table(status$researchers)), 1)

Upvotes: 2

How to index outliers?

Answers (2)

Related Questions