Reputation: 3
I am using R and I have a dataframe with over 23,000 entries like so:
df[1:10,10:13]
Start End Distance under50
1 248 1453 3 0
2 1450 2115 -30 0
3 2145 3254 -68 1
4 3322 5643 -3 0
5 5646 7451 -3 0
6 7454 8641 -14 0
7 8655 9860 -120 1
8 9980 11413 -65 1
9 11478 12614 -364 1
10 12978 13187 -275 1
What I am trying to do is based on the value in df[,4] starting from 0 and ending on 1, select the first number from df[,1] and the number in df[,2] with the column ending on 1.
So for example for this data row #1 starts with 0, row 2 is also 0, and row 3 is 1. So I would like to take 248 from row 1 and 3254 from row 3 then put these in a new dataframe with the column names Start and End.
Then I would like this to loop to find every instance within the dataframe. So for this example it would also pull the values for row start and row 7 end.
So in the end I would like an extracted dataframe that looks like this:
Start End
1 248 3254
2 3322 9860
3 #### ####
I attempted to write a loop that selected start if df[,4]==0 and end based on df[,4]==1 but it was selecting every start with df[,4]==0
I appreciate any help I can get!
Upvotes: 0
Views: 105
Reputation: 60150
Something like this should work:
# The under50 flag is 1 and has just gone up by 1: end of a group
df$group_end = (df$under50 == 1) & (c(0, diff(df$under50)) == 1)
# The under50 flag is 0 and has just gone down by 1
# (always true for first row): start of a group
df$group_start = c(-1, diff(df$under50)) == -1
starts = df$Start[df$group_start]
ends = df$End[df$group_end]
You can look at the group_end
and group_start
columns to check how it's working. Here's the output from your example dataset:
> starts
[1] 248 3322
> ends
[1] 3254 9860
You may have to modify this slightly depending on how your actual dataset looks towards the final rows, it may identify an extra start without an end.
Upvotes: 2