Darth Ratus
Darth Ratus

Reputation: 49

Filtering Data Frame using row_number

I need to extract data from certain specific rows in a data frame, rows 2, 9, 14, 19, 24, etc. I can basically extract row 9, 14, 19, 24, etc via the filter command below using 4 modulus 5.

Parsed_Data_Frame <- Source_Data_Frame %>% filter(row_number() %%5 == 4)

However, this misses row 2, and it also reads row 4 which I do not need.

I brute-forced this in a handful of steps, I saved row 2 into one data frame (Header_Data_Frame), I saved the rest of the data into another data frame (Data_Frame). Finally I used rbind to combine the two data frames together. I then removed row 2 from this last data frame since that was row 4 in Data_Frame which I did not want.

        Header_Data_Frame <- Temperature_Data_Frame[2,]
        Data_Frame <- Temperature_Data_Frame %>% filter(row_number() %%5 == 4)
        Junction_Data_Frame <- rbind(Header_Data_Frame,Data_Frame)
        Junction_Data_Frame <- Junction_Data_Frame[-c(2),]

This works but there has to be a more elegant way to do this.

Adding partial data set (source is text file read into data frame). First 6 rows indicate the different header names (these change depending on the data set). Dash line separates actual data. For one data frame I need Header2 (which basically becomes the Column name later), and entries 9, 14, 19, 24 (in increments of 5 since every 5 lines, there is a new set of numbers). Actual text file read has about 1000 lines of data, which is the reason I was using the modulus operation to extra the 4(mod 5)-th row (i.e., 4, 9, 14, 19, 24, ..., 999).

Header1
Header2
Header3
Header4
Header5
Header6
-----------------------------------
41628
60060
41028
41465
-----------------------------------
41629
60145
41003
41471
-----------------------------------
41700
60083
41076
41534
-----------------------------------
41699
60264
41076
41533

Upvotes: 0

Views: 457

Answers (1)

ScottyJ
ScottyJ

Reputation: 1087

Maybe just a compound filter statement would work for you? I am mutating a row number rn column just to show the starting row numbers before the filtering happens -- you wouldn't need the mutate() call.

> mtcars %>% 
    mutate(rn = row_number(), .before = 1) %>% 
    filter((row_number() %% 5 == 4 & row_number() > 4) | row_number() == 2)

               rn  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag   2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Merc 230        9 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 450SLC    14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Honda Civic    19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Camaro Z28     24 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Ford Pantera L 29 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4

Upvotes: 1

Related Questions