Reputation: 3020
Let us say I have a data frame "dat" like:
col1 col2
12 a
43 a
54 a
11 a
33 b
43 b
34 c
34 c
342 c
343 c
Now I have a vector as
vec <- c(a,a,a,b,c,c)
What I want to do is to remove extra rows in data frame "dat" as per vector "vec" which means in the data frame keep only first 3 rows corresponding to "a", keep only first 1 row corresponding to "b" and keep only first 2 rows corresponding to c.
I should get the output as
col1 col2
12 a
43 a
54 a
33 b
34 c
34 c
What is the fastest way to do without having to use for loop?
Upvotes: 4
Views: 1730
Reputation: 101663
With Base R, you can use subset
+ ave
+ table
like below
> subset(df, ave(col2, col2, FUN = seq_along) <= table(vec)[col2])
col1 col2
1 12 a
2 43 a
3 54 a
5 33 b
7 34 c
8 34 c
Upvotes: 0
Reputation: 378
Using Base R
functions can be done:
Data
dat <- read.table(header=T, text=
' col1 col2
12 a
43 a
54 a
11 a
33 b
43 b
34 c
34 c
342 c
343 c')
vec <- c('a','a','a','b','c','c')
Procedure
dat[unlist(lapply(table(vec), function(x) 1:x))+match(vec, dat[,2])-1,]
OUTPUT
col1 col2
1 12 a
2 43 a
3 54 a
5 33 b
7 34 c
8 34 c
Upvotes: 1
Reputation: 99331
Here's another Map()
approach.
fvec <- factor(vec)
## find the index for the first occurrence of a new level
m <- match(levels(fvec), df$col2)
df[unlist(Map(seq, from = m, length.out = tabulate(fvec))), ]
# col1 col2
# 1 12 a
# 2 43 a
# 3 54 a
# 5 33 b
# 7 34 c
# 8 34 c
Or you could use rle()
after matching
rl <- rle(match(vec, df$col2))
df[unlist(Map(seq, rl$values, length.out = rl$lengths)),]
# col1 col2
# 1 12 a
# 2 43 a
# 3 54 a
# 5 33 b
# 7 34 c
# 8 34 c
Upvotes: 3
Reputation: 887213
This could be also done by after creating a sequence colum
library(data.table)
setkey(setDT(dat)[, N:= 1:.N, col2], col2, N)
dat[setDT(list(col2=vec))[, N:=1:.N, col2]][, N:= NULL][]
# col1 col2
#1: 12 a
#2: 43 a
#3: 54 a
#4: 33 b
#5: 34 c
#6: 34 c
Upvotes: 3
Reputation: 21621
Using dplyr
you could do:
#create a data frame with frequencies
tv <- data.frame(table(vec))
#filter values
group_by(dat, col2) %>%
filter(row_number() <= tv$Freq[tv$vec %in% col2])
Which gives:
#Source: local data frame [6 x 2]
#Groups: col2
#
# col1 col2
#1 12 a
#2 43 a
#3 54 a
#4 33 b
#5 34 c
#6 34 c
Upvotes: 3
Reputation: 37879
This is a way using split
and Map
:
Data
dat <- read.table(header=T, text=' col1 col2
12 a
43 a
54 a
11 a
33 b
43 b
34 c
34 c
342 c
343 c',stringsAsFactors=F)
vec <- c('a','a','a','b','c','c')
Solution
#count frequencies
tabvec <- table(vec)
data.frame(do.call(rbind,
#use split to split data.frame according to col2
#use head to only choose the first n rows according to tabvec
#convert output into a data.frame
Map(function(x,y) head(x,y), split(dat, as.factor(dat$col2)), tabvec)
))
Output:
col1 col2
a.1 12 a
a.2 43 a
a.3 54 a
b 33 b
c.7 34 c
c.8 34 c
Upvotes: 3