pauljeba
pauljeba

Reputation: 770

Select a string with max length while using Group by in data table in r

My initial sample data was ambiguous so updating my data set

a <- data.table(name=c("?","","One","?","","Two"), value=c(1,3,2,6,5,2) , job=c(1,1,1,2,2,2) )

 name value job
1:    ?     1   1
2:          3   1
3:  One     2   1
4:    ?     6   2
5:          5   2
6:  Two     2   2

I want to group by the column "job" while finding the maximum in column "value" and selecting the "name" which has the maximum length.

My sample output would be

   name job value
1: One    1     3
2: Two    2     6

I think I want the equivalent of How do I select the longest 'string' from a table when grouping in R

Upvotes: 2

Views: 2296

Answers (2)

Vincent Bonhomme
Vincent Bonhomme

Reputation: 7443

I'm not sure you want a dplyr solution but here is one:

library(dplyr)
a %>% group_by(job) %>% slice(which.max(nchar(as.character(name))))

    name value   job
  (fctr) (dbl) (dbl)
1    One     3     1
2    Two     6     2

Upvotes: 2

akrun
akrun

Reputation: 887213

We can group by 'job', get the index of the max number of characters (nchar) in 'name' and subset the dataset.

a[, .SD[which.max(nchar(name)) ], by = job]
#    name value job
#1:  One     3   1
#2:  Two     6   2

Or get the row index (.I) from which.max, extract the column with the index ("V1") and subset the dataset.

a[a[, .I[which.max(nchar(name))], by = job]$V1]

Update

Based on the new example, if the 'value' is not corresponding to the maximum number of character in 'name', we need to select it separately.

a[, .(value= max(value), name = name[which.max(nchar(name))]),
                      by = job]
#     job value name
#1:   1     3  One
#2:   2     6  Two

Upvotes: 3

Related Questions