Lucy
Lucy

Reputation: 401

how to take last occurance of a repeated string in a data frame in R

My updated data set is as below

tmp_number_1    tmp_name_1         ID
7990918840  Yvette             33098
7958376552  Mum                33098
7951755055  Dad                33098
7951755055  Dad mob            33098
7581498864  Wynne Lewis        33098
7581498864  Wynne Lewis mob    33098
87128486    James Braithewaite 33098
1869353690  Fleetclaims        33098
447915381850    Kath               33098
919446540717    Sujata Egbert      33098
87124812    Chris  Riley       33098
7958376552  Mum Mob            33098
7958376552  Mum Mob new        33098

I want the last row of the records where "tmp_number_1" is repeated.

Answer I am looking is

tmp_number_1    tmp_name_1           ID
7990918840  Yvette               33098
**7951755055    Dad mob**            33098
**7581498864    Wynne Lewis mob**    33098
87128486    James Braithewaite   33098
1869353690  Fleetclaims          33098
447915381850    Kath                 33098
919446540717    Sujata Egbert        33098
87124812    Chris  Riley         33098
**7958376552    Mum Mob new**        33098

** is the last occurance of "tmp_number_1"

Upvotes: 1

Views: 68

Answers (3)

cdeterman
cdeterman

Reputation: 19960

Wouldn't this be a good place to use setkey and unique? Borrowing df1 from @akrun with the updated ID column.

EDIT - As per @Arun suggestion, the use of setkey is not needed here.

library(data.table)
unique(setDT(df1), by="tmp_number_1", fromLast=TRUE)

   tmp_number_1         tmp_name_1    ID
1:   7990918840             Yvette 33098
2:   7951755055            Dad mob 33098
3:   7581498864    Wynne Lewis mob 33098
4:     87128486 James Braithewaite 33098
5:   1869353690        Fleetclaims 33098
6: 447915381850               Kath 33098
7: 919446540717      Sujata Egbert 33098
8:     87124812       Chris  Riley 33098
9:   7958376552        Mum Mob new 33098

Upvotes: 1

akrun
akrun

Reputation: 887048

You could try

library(dplyr)
df1 %>% 
    group_by(tmp_number_1) %>% 
    slice(n())

Or

library(data.table)
setDT(df1)[, .SD[.N], tmp_number_1]

Or

setDT(df1)[df1[,seq_len(.N)==.N , tmp_number_1]$V1]

data

df1 <- structure(list(tmp_number_1 = c(7990918840, 7958376552, 
7951755055, 
7951755055, 7581498864, 7581498864, 87128486, 1869353690, 
447915381850, 
919446540717, 87124812, 7958376552, 7958376552), 
tmp_name_1 = c("Yvette", 
"Mum", "Dad", "Dad mob", "Wynne Lewis", "Wynne Lewis mob",
"James Braithewaite", 
"Fleetclaims", "Kath", "Sujata Egbert", "Chris  Riley", "Mum Mob", 
"Mum Mob new")), .Names = c("tmp_number_1", "tmp_name_1"), 
class = "data.frame", row.names = c(NA, -13L))

Upvotes: 4

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

If df is your data.frame, you can try:

 library(data.table)

 setDT(df)[,tail(tmp_name_1,1),by=tmp_number_1]
#   tmp_number_1                 V1
#1:   7990918840             Yvette
#2:   7958376552        Mum Mob new
#3:   7951755055            Dad mob
#4:   7581498864    Wynne Lewis mob
#5:     87128486 James Braithewaite
#6:   1869353690        Fleetclaims
#7: 447915381850               Kath
#8: 919446540717      Sujata Egbert
#9:     87124812        Chris Riley

Upvotes: 3

Related Questions