el_dewey
el_dewey

Reputation: 97

Subset first n occurrences of certain value in dataframe

Suppose I have a matrix (or dataframe):

1  5  8
3  4  9
3  9  6
6  9  3
3  1  2
4  7  2
3  8  6
3  2  7

I would like to select only the first three rows that have "3" as their first entry, as follows:

3  4  9
3  9  6
3  1  2

It is clear to me how to pull out all rows that begin with "3" and it is clear how to pull out just the first row that begins with "3."

But in general, how can I extract the first n rows that begin with "3"?

Furthermore, how can I select just the 3rd and 4th appearances, as follows:

3  1  2
3  8  6

Upvotes: 3

Views: 2154

Answers (3)

akrun
akrun

Reputation: 886948

We could also use subset

head(subset(mydf, V1==3),3)

Update

If we need to extract also one row below the rows where V1==3,

i1 <- with(mydf, V1==3)
mydf[sort(unique(c(which(i1),pmin(which(i1)+1L, nrow(mydf))))),]

Upvotes: 1

Jaap
Jaap

Reputation: 83215

Without the need for an extra package:

mydf[mydf$V1==3,][1:3,]

results in:

  V1 V2 V3
2  3  4  9
3  3  9  6
5  3  1  2

When you need the third and fourth row:

mydf[mydf$V1==3,][3:4,]
# or:
mydf[mydf$V1==3,][c(3,4),]

Used data:

mydf <- structure(list(V1 = c(1L, 3L, 3L, 6L, 3L, 4L, 3L, 3L), 
                       V2 = c(5L, 4L, 9L, 9L, 1L, 7L, 8L, 2L), 
                       V3 = c(8L, 9L, 6L, 3L, 2L, 2L, 6L, 7L)), 
                  .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -8L))

Bonus material: besides dplyr, you can do this also very efficiently with data.table (see this answer for speed comparisons on large datasets for the different data.table methods):

setDT(mydf)[V1==3, head(.SD,3)]
# or:
setDT(mydf)[V1==3, .SD[1:3]]

Upvotes: 5

Gopala
Gopala

Reputation: 10473

You can do something like this with dplyr to extract first three rows of each unique value of that column:

library(dplyr)
df %>% arrange(columnName) %>% group_by(columnName) %>% slice(1:3)

If you want to extract only three rows when the value of that column, you can try:

df %>% filter(columnName == 3) %>% slice(1:3)

If you want specific rows, you can supply to slice as c(3, 4), for example.

Upvotes: 2

Related Questions