CISCO
CISCO

Reputation: 539

How to remove observations from multiple dataframes and keep as multiple dataframes

I have many data frames - Here is a simplified version of two of them.

flows <- structure(list(Student = c("Adam", "Char", "Fred", "Greg", "Ed", "Mick", "Dave", "Nick", "Tim", "George", "Tom"), 
                 Class = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), Jan_18_score = c(NA, 5L, -7L, 2L, 1L, NA, 5L, 8L, -2L, 5L, NA), 
                 Feb_18_score = c(2L,   0, 8L, NA, 2L, 6L, NA, 8L, 7L, 3L, 8L), Jan_18_Weight = c(150L, 30L, NA, 80L, 60L, 80L, 40L, 12L, 23L, 65L, 78L), 
                 Feb_18_Weight = c(153L, 60L, 80L, 40L, 80L, 30L, 25L, 45L, 40L, NA, 50L)), class = "data.frame", row.names = c(NA, -11L))


returns <- structure(list(Student = c("Adam", "Char", "Fred", "Greg", "Ed", "Mick", "Dave", "Nick", "Tim", "George", "Tom"), 
                  Class = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), Jan_20_score = c(NA, 5L, -7L, 2L, 1L, NA, 5L, 8L, -2L, 5L, NA), 
                  Feb_20_score = c(2L,   0, 8L, NA, 2L, 6L, NA, 8L, 7L, 3L, 8L), Jan_20_Weight = c(150L, 30L, NA, 80L, 60L, 80L, 40L, 12L, 23L, 65L, 78L), 
                  Feb_20_Weight = c(153L, 60L, 80L, 40L, 80L, 30L, 25L, 45L, 40L, NA, 50L)), class = "data.frame", row.names = c(NA, -11L))

I am using lapply to remove some observations, I would like to do this across all my dataframes and keep the output as dataframes, basically update the existing dataframes and remove the observations I select.

Here is my current code.

 df.list <- list(flows, returns)

lapply(df.list, function(df) df[!grepl("1", df$Class),])

However, when I do this the output is not updating the original dataframes and is outputting as a list in the global environment. Any help is appreciated.

Upvotes: 0

Views: 77

Answers (2)

Vitali Avagyan
Vitali Avagyan

Reputation: 1203

Another solution:

flows <- structure(list(Student = c("Adam", "Char", "Fred", "Greg", "Ed", "Mick", "Dave", "Nick", "Tim", "George", "Tom"), 
                 Class = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), Jan_18_score = c(NA, 5L, -7L, 2L, 1L, NA, 5L, 8L, -2L, 5L, NA), 
                 Feb_18_score = c(2L,   0, 8L, NA, 2L, 6L, NA, 8L, 7L, 3L, 8L), Jan_18_Weight = c(150L, 30L, NA, 80L, 60L, 80L, 40L, 12L, 23L, 65L, 78L), 
                 Feb_18_Weight = c(153L, 60L, 80L, 40L, 80L, 30L, 25L, 45L, 40L, NA, 50L)), class = "data.frame", row.names = c(NA, -11L))

returns <- structure(list(Student = c("Adam", "Char", "Fred", "Greg", "Ed", "Mick", "Dave", "Nick", "Tim", "George", "Tom"), 
                  Class = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), Jan_20_score = c(NA, 5L, -7L, 2L, 1L, NA, 5L, 8L, -2L, 5L, NA), 
                  Feb_20_score = c(2L,   0, 8L, NA, 2L, 6L, NA, 8L, 7L, 3L, 8L), Jan_20_Weight = c(150L, 30L, NA, 80L, 60L, 80L, 40L, 12L, 23L, 65L, 78L), 
                  Feb_20_Weight = c(153L, 60L, 80L, 40L, 80L, 30L, 25L, 45L, 40L, NA, 50L)), class = "data.frame", row.names = c(NA, -11L))

 df.list <- list(flows, returns)

Now, we need to assign lapply to some value and name it:

a <- lapply(df.list, function(df) df[!grepl("1", df$Class),])
names(a) <- c("flows","returns")

After this, we call list2env function:

list2env(a, envir = .GlobalEnv)

Output:

> flows
   Student Class Jan_18_score Feb_18_score Jan_18_Weight Feb_18_Weight
5       Ed     2            1            2            60            80
6     Mick     2           NA            6            80            30
7     Dave     3            5           NA            40            25
8     Nick     3            8            8            12            45
9      Tim     3           -2            7            23            40
10  George     3            5            3            65            NA
11     Tom     3           NA            8            78            50

> returns
   Student Class Jan_20_score Feb_20_score Jan_20_Weight Feb_20_Weight
5       Ed     2            1            2            60            80
6     Mick     2           NA            6            80            30
7     Dave     3            5           NA            40            25
8     Nick     3            8            8            12            45
9      Tim     3           -2            7            23            40
10  George     3            5            3            65            NA
11     Tom     3           NA            8            78            50

Checking classes of the outputs:

> class(returns)
[1] "data.frame"

> class(flows)
[1] "data.frame"

Upvotes: 2

Simon Woodward
Simon Woodward

Reputation: 2026

I'm not sure about using lapply but you can work with lists of variables by name using get and assign.

flows <- structure(list(Student = c("Adam", "Char", "Fred", "Greg", "Ed", "Mick", "Dave", "Nick", "Tim", "George", "Tom"), 
                        Class = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), Jan_18_score = c(NA, 5L, -7L, 2L, 1L, NA, 5L, 8L, -2L, 5L, NA), 
                        Feb_18_score = c(2L,   0, 8L, NA, 2L, 6L, NA, 8L, 7L, 3L, 8L), Jan_18_Weight = c(150L, 30L, NA, 80L, 60L, 80L, 40L, 12L, 23L, 65L, 78L), 
                        Feb_18_Weight = c(153L, 60L, 80L, 40L, 80L, 30L, 25L, 45L, 40L, NA, 50L)), class = "data.frame", row.names = c(NA, -11L))


returns <- structure(list(Student = c("Adam", "Char", "Fred", "Greg", "Ed", "Mick", "Dave", "Nick", "Tim", "George", "Tom"), 
                          Class = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), Jan_20_score = c(NA, 5L, -7L, 2L, 1L, NA, 5L, 8L, -2L, 5L, NA), 
                          Feb_20_score = c(2L,   0, 8L, NA, 2L, 6L, NA, 8L, 7L, 3L, 8L), Jan_20_Weight = c(150L, 30L, NA, 80L, 60L, 80L, 40L, 12L, 23L, 65L, 78L), 
                          Feb_20_Weight = c(153L, 60L, 80L, 40L, 80L, 30L, 25L, 45L, 40L, NA, 50L)), class = "data.frame", row.names = c(NA, -11L))

df.list <- list("flows", "returns")

for (df.name in df.list){
    temp <- get(df.name)
    temp <- temp[!grepl("1", temp$Class), ]
    assign(paste0(df.name, "_new"), temp)
}

Remove "_new" to overwrite the original variables.

Upvotes: 1

Related Questions