Selecting first and last value based on binary column value

Question

I am using R and I have a dataframe with over 23,000 entries like so:

 df[1:10,10:13]
   Start   End Distance      under50
1    248  1453        3            0
2   1450  2115      -30            0
3   2145  3254      -68            1
4   3322  5643       -3            0
5   5646  7451       -3            0
6   7454  8641      -14            0
7   8655  9860     -120            1
8   9980 11413      -65            1
9  11478 12614     -364            1
10 12978 13187     -275            1

What I am trying to do is based on the value in df[,4] starting from 0 and ending on 1, select the first number from df[,1] and the number in df[,2] with the column ending on 1.

So for example for this data row #1 starts with 0, row 2 is also 0, and row 3 is 1. So I would like to take 248 from row 1 and 3254 from row 3 then put these in a new dataframe with the column names Start and End.

Then I would like this to loop to find every instance within the dataframe. So for this example it would also pull the values for row start and row 7 end.

So in the end I would like an extracted dataframe that looks like this:

  Start   End      
1    248  3254                 
2   3322  9860
3   ####  ####

I attempted to write a loop that selected start if df[,4]==0 and end based on df[,4]==1 but it was selecting every start with df[,4]==0

I appreciate any help I can get!

Marius · Accepted Answer

Something like this should work:

# The under50 flag is 1 and has just gone up by 1: end of a group
df$group_end = (df$under50 == 1) & (c(0, diff(df$under50)) == 1)
# The under50 flag is 0 and has just gone down by 1 
#   (always true for first row): start of a group
df$group_start = c(-1, diff(df$under50)) == -1

starts = df$Start[df$group_start]
ends = df$End[df$group_end]

You can look at the group_end and group_start columns to check how it's working. Here's the output from your example dataset:

> starts
[1]  248 3322
> ends
[1] 3254 9860

You may have to modify this slightly depending on how your actual dataset looks towards the final rows, it may identify an extra start without an end.

Selecting first and last value based on binary column value

Answers (1)

Related Questions