MERose
MERose

Reputation: 4421

gsub() on multiple dataframes in loop/lapply

I have two dataframes with a column named 'Title' in each, containing string. I need to reduce these strings in order to merge them. Now I want to make this as clean as possible in a loop such that I only have to write the gsub-function once.

Let's say I have:

df_1 <-read.table(text="
id Title
1 some_average_title
2 another:_one
3 the_third!
4 and_'the'_last
",header=TRUE,sep="")

and:

df_2 <-read.table(text="
id Title
1 some_average.title
2 another:one
3 the_third
4 and_the_last
",header=TRUE,sep="")

I would now run:

df_1$Title <- gsub(" |\\.|'|:|!|\\'|_", "", df_1$Title )
df_2$Title <- gsub(" |\\.|'|:|!|\\'|_", "", df_2$Title )

I tried the following loop:

for (dtfrm in c("dt_1", "df_2")) {
  assign(paste0(dtfrm, "$Title"),
    gsub(" |\\.|'|:|!|\\'|", "", get(paste0(dtfrm, "$Title")))
    )
  }

but it doesn't work - despite the lack of error messages.

I was also thinking about lapply(list(dt_1, dt_2), function(w){ w$Title <- XXX })but I don't know what to put for XXX because gsub()needs as a third argument the list of strings.

Upvotes: 5

Views: 3069

Answers (3)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

Somewhere between @David's comment and @Carlos's answer, with a little bit extra:

Use mget to grab your data.frames, and list2env to copy over the original data.frames if so desired.

mget + lapply will do the transformation....

lapply(mget(ls(pattern = "df_\\d")), function(w)
  transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title)))
# $df_1
#   id            Title
# 1  1 someaveragetitle
# 2  2       anotherone
# 3  3         thethird
# 4  4       andthelast
# 
# $df_2
#   id            Title
# 1  1 someaveragetitle
# 2  2       anotherone
# 3  3         thethird
# 4  4       andthelast

... but the result stays in a list and doesn't affect the original data.frames:

# df_1
#   id              Title
# 1  1 some_average_title
# 2  2       another:_one
# 3  3         the_third!
# 4  4     and_'the'_last

If you did want to overwrite the data.frames, try:

list2env(
  lapply(mget(ls(pattern = "df_\\d")), function(w) 
    transform(w, Title = gsub(" |\\.|'|:|!|\\'|_", "", Title))), 
  envir = .GlobalEnv)
df_1
#   id            Title
# 1  1 someaveragetitle
# 2  2       anotherone
# 3  3         thethird
# 4  4       andthelast

Upvotes: 1

Carlos Cinelli
Carlos Cinelli

Reputation: 11597

This works:

for(df in c("df_1", "df_2")){
  assign(df, transform(get(df), Title =  gsub(" |\\.|'|:|!|\\'|_", "", Title)))
}

Testing:

df_1
  id            Title
1  1 someaveragetitle
2  2       anotherone
3  3         thethird
4  4       andthelast

And:

  df_2
  id            Title
1  1 someaveragetitle
2  2       anotherone
3  3         thethird
4  4       andthelast

Upvotes: 1

Ricardo Saporta
Ricardo Saporta

Reputation: 55350

get() will allow you to grab your multiple datasets programatically.
data.table() will be helpful in modifying the columns in each with ease

## CREATING A FEW MORE DATA SETS
df_3 <- df_2
df_4 <- df_1
set.seed(1)
df_3$id <- sample(20, 4)
df_4$id <- sample(20, 4)

library(data.table)

dt_1 <- as.data.table(df_1)
dt_2 <- as.data.table(df_2)
dt_3 <- as.data.table(df_3)
dt_4 <- as.data.table(df_4)

## OR programatically: 

Numb_of_DTs <- 4

names_of_dt_objects <- paste("dt", 1:Numb_of_DTs, sep="_")  # dt_1, dt_2, etc
names_of_df_objects <- paste("df", 1:Numb_of_DTs, sep="_")  # dt_1, dt_2, etc

for (i in 1:Numb_of_DTs)
  assign(names_of_dt_objects[[i]], as.data.table(get(namse(names_of_df_objects[[i]]))))


for (dt.nm in names_of_dt_objects) {
  get(dt.nm)[, Title := gsub("[ .':!_]", "", Title)]
  ## set the key for merging in the next step
  setkey(get(dt.nm), Title)
  ## You might want to insert a line to clean up the column names, using 
  ##   setnames(get(dt.nm), OLD_NAMES, NEW_NAMES)
}


Reduce(merge, lapply(names_of_dt_objects, function(x) get(x)))

Upvotes: 0

Related Questions