Feixiang Sun
Feixiang Sun

Reputation: 117

my function doesn't work when mutiple dataframe in space

I write a function to process large data. This is my first time writing r function. It works well if only read one .cvs file or just one data frame in the space. However, if there is another data frame or more, it will not work. It seems the function can identify other data frames even though the name of the used data frame is given.

My function is:

Sun <- function(data,A) {
  data$V4 <- as.character(data$V4)
  dt_1 <- data %>%
    mutate(V4 = str_replace(V4, " ",""))
  dt_2 <- dt_1 %>%
    mutate(V4 = str_replace(V4, "//",""))
  dt_3 <- dt_2 %>%
    mutate(V4 = str_replace(V4, " ","("))
  dt_4 <- str_split_fixed(dt_3$V4 , ";", 100)
  dt_5 <- data.frame(dt,dt_4)
  dt_6 <- dt_5 %>%
    mutate_at(vars(X1:X100), ~ substr(., 1,4))
  dt_7 <- data.table(dt_6[!sapply(dt_6, function(x) all(x == ""))])
  DT <- dt_7[,-1]
  col_names <- tail(names(DT), -4)
  co <- DT[,
           sapply(
             A,
             function (code) { pmin(1, rowSums(.SD == code, na.rm=T)) },
             simplify=F, USE.NAMES=T
           ),
           .SDcols=col_names
           ]
  
}

for example, if I have two data frames at the same time in r space, named DF1 and DF2.

then there will be something wrong. I am confused about this.

Sun(DF1, A)

DF1 is like:

V1   V2      V3                         V4
1   id1   2012.09.28    E05B63/14(2006.01);E05B47/00(2006.01) 
2   id2   2010.08.20    G01B5/14(2006.01);G01B5/02(2006.01) 
3   id3   2009.01.08    H02J3/00(2006.01);G01R23/02(2006.01) 

DF2 for example:

V1   V2      V3                         V4
1   id1   2012.09.28    A05B63/14;E05B47/00(2006.01) 
2   id2   2010.08.20    D01B5/14 
3   id3   2009.01.08    H02J3/00(2006.01);G01R23/02(2006.01) 

A is a vector as below

A01B A02B A03B A04B A05B G01B H02J G01R E05B

Upvotes: 0

Views: 29

Answers (1)

Abdessabour Mtk
Abdessabour Mtk

Reputation: 3888

All I did was "optimize" the code and the function works correctly, again i think the issue is with using an undefined dt as I mentioned in the comments:

Sun <- function(data,A) {
   dt <- data.table(data)
    dt[, V4:=str_replace_all(as.character(V4),c(" |//"="", "//"="") )][,
            str_split_fixed(V4 , ";", 100)
    ] -> splits
    data.table(substr(splits, 1,4)) -> splits
    
    splits[, which(sapply(.SD, function(x) all(!nzchar(x))))] -> rem
    splits[, (rem):=NULL]
    
     splits[,
           sapply(
             A,
             function (code) { pmin(1, rowSums(.SD == code, na.rm=T)) },
             simplify=F, USE.NAMES=T
           )]
  
}
> Sun(DF1, A)
   A01B A02B A03B A04B A05B G01B H02J G01R E05B
1:    0    0    0    0    0    0    0    0    1
2:    0    0    0    0    0    1    0    0    0
3:    0    0    0    0    0    0    1    1    0
> Sun(DF2, A)
   A01B A02B A03B A04B A05B G01B H02J G01R E05B
1:    0    0    0    0    1    0    0    0    1
2:    0    0    0    0    0    0    0    0    0
3:    0    0    0    0    0    0    1    1    0

Upvotes: 1

Related Questions