user3664020
user3664020

Reputation: 3020

Reduce a data frame to fewer rows

Let us say I have a data frame "dat" like:

 col1     col2
  12       a
  43       a
  54       a
  11       a
  33       b
  43       b
  34       c
  34       c
  342      c
  343      c

Now I have a vector as

vec <- c(a,a,a,b,c,c)

What I want to do is to remove extra rows in data frame "dat" as per vector "vec" which means in the data frame keep only first 3 rows corresponding to "a", keep only first 1 row corresponding to "b" and keep only first 2 rows corresponding to c.

I should get the output as

 col1     col2
  12       a
  43       a
  54       a
  33       b
  34       c
  34       c

What is the fastest way to do without having to use for loop?

Upvotes: 4

Views: 1730

Answers (6)

ThomasIsCoding
ThomasIsCoding

Reputation: 101663

With Base R, you can use subset + ave + table like below

> subset(df, ave(col2, col2, FUN = seq_along) <= table(vec)[col2])
  col1 col2
1   12    a
2   43    a
3   54    a
5   33    b
7   34    c
8   34    c

Upvotes: 0

Alan G&#243;mez
Alan G&#243;mez

Reputation: 378

Using Base R functions can be done:

Data

    dat <- read.table(header=T, text=
   ' col1     col2
      12       a
      43       a
      54       a
      11       a
      33       b
      43       b
      34       c
      34       c
      342      c
      343      c')
    
    vec <-  c('a','a','a','b','c','c')

Procedure

dat[unlist(lapply(table(vec), function(x) 1:x))+match(vec, dat[,2])-1,]

OUTPUT

  col1 col2
1   12    a
2   43    a
3   54    a
5   33    b
7   34    c
8   34    c

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99331

Here's another Map() approach.

fvec <- factor(vec)
## find the index for the first occurrence of a new level
m <- match(levels(fvec), df$col2)

df[unlist(Map(seq, from = m, length.out = tabulate(fvec))), ]
#   col1 col2
# 1   12    a
# 2   43    a
# 3   54    a
# 5   33    b
# 7   34    c
# 8   34    c

Or you could use rle() after matching

rl <- rle(match(vec, df$col2))
df[unlist(Map(seq, rl$values, length.out = rl$lengths)),]
#   col1 col2
# 1   12    a
# 2   43    a
# 3   54    a
# 5   33    b
# 7   34    c
# 8   34    c

Upvotes: 3

akrun
akrun

Reputation: 887213

This could be also done by after creating a sequence colum

library(data.table)
setkey(setDT(dat)[, N:= 1:.N, col2], col2, N)
dat[setDT(list(col2=vec))[, N:=1:.N, col2]][, N:= NULL][]
#   col1 col2
#1:   12    a
#2:   43    a
#3:   54    a
#4:   33    b
#5:   34    c
#6:   34    c

Upvotes: 3

Steven Beaupr&#233;
Steven Beaupr&#233;

Reputation: 21621

Using dplyr you could do:

#create a data frame with frequencies
tv <- data.frame(table(vec))

#filter values       
group_by(dat, col2) %>%
filter(row_number() <= tv$Freq[tv$vec %in% col2])

Which gives:

#Source: local data frame [6 x 2]
#Groups: col2
#
#  col1 col2
#1   12    a
#2   43    a
#3   54    a
#4   33    b
#5   34    c
#6   34    c

Upvotes: 3

LyzandeR
LyzandeR

Reputation: 37879

This is a way using split and Map:

Data

dat <- read.table(header=T, text=' col1     col2
  12       a
  43       a
  54       a
  11       a
  33       b
  43       b
  34       c
  34       c
  342      c
  343      c',stringsAsFactors=F)

vec <-  c('a','a','a','b','c','c')

Solution

#count frequencies
tabvec <- table(vec)

data.frame(do.call(rbind,
   #use split to split data.frame according to col2
   #use head to only choose the first n rows according to tabvec
   #convert output into a data.frame
   Map(function(x,y) head(x,y),  split(dat, as.factor(dat$col2)), tabvec)
))

Output:

    col1 col2
a.1   12    a
a.2   43    a
a.3   54    a
b     33    b
c.7   34    c
c.8   34    c

Upvotes: 3

Related Questions