Reputation: 81
I'm quite the novice when it comes to R and programming in general so any help would be very much appreciated. I have a large dataset with many rows with over 100 different IDs. Each 'ID' has 4 numbers. Each 'Number' has 5 records.
ID Number start end s.mean Error
1 A2 1 61735 23342732 0.0314 2.04
2 A2 1 23345569 54962669 -0.0103 1.98
3 A2 1 54963958 55075062 0.4841 2.79
4 A2 1 55085141 65826284 0.0047 2.00
5 A2 1 65826928 115611498 -0.0241 1.96
6 A2 2 12784 17248573 -0.0037 1.99
7 A2 2 17248890 85480817 -0.0331 1.95
8 A2 2 85481399 89121495 0.0153 2.02
9 A2 2 89122081 89417610 0.3708 2.58
10 A2 2 89418929 89999062 -0.1826 1.76
11 A2 3 162626603 185477402 -0.0759 1.89
12 A2 3 185478957 189050664 0.0080 2.01
13 A2 3 189056732 192873807 -0.0985 1.86
14 A2 3 192874747 192882903 0.9053 3.74
15 A2 3 192886435 197896118 -0.0645 1.91
16 A3 1 61735 23342732 0.0314 2.04
17 A3 1 23345569 54962669 -0.0103 1.98
18 A3 1 54963958 55075062 0.4841 2.79
19 A3 1 55085141 65826284 0.0047 2.00
20 A3 1 65826928 115611498 -0.0241 1.96
I am wondering if it possible to create a function to subtract the 'start' from the 'end' of each record to determine which is the longest of each 'Number'. I was hoping for an output such as...
ID Number Length
1 A2 1 xxxxxx
2 A2 2 xxxxxx
3 A2 3 xxxxxx
4 A3 1 xxxxxx
Where 'xxxxxx' is the output of the longest length calculated.
Would it also be possible to select the largest 'Error' of each 'Number' using a function? Maybe having a similar output to the Length output above?
Not too sure how to tackle this. Again, any help would be much appreciated.
Upvotes: 2
Views: 37
Reputation: 887891
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'ID', 'Number', get the difference of 'end', 'start' and find the max
library(data.table)
setDT(df1)[, .(Length = max(end - start)), .(ID, Number)]
Or with dplyr
library(dplyr)
df1 %>%
group_by(ID, Number) %>%
summarise(Length = max(end - start))
Upvotes: 1