G_T
G_T

Reputation: 1587

Looping with tidyr

I have some data from Wikipedia:

RHCP_data
                V1              V2              V3           V4
1       bar:kiedis from:01/01/1983 till:01/11/1986 color:vocals
2       bar:kiedis from:01/12/1986        till:end color:vocals
3         bar:flea from:01/01/1983        till:end   color:bass
4        bar:smith from:03/12/1988        till:end  color:drums
5  bar:klinghoffer from:01/10/2009        till:end   color:lead
6       bar:slovak from:01/01/1983 till:01/12/1983   color:lead
7       bar:slovak from:01/02/1985 till:25/06/1988   color:lead
...
...

I am trying to use tidyr to remove variable names and this works great:

separate(RHCP_data, "V1", into = c("a", "b"), sep = ":")[2]

             b
1       kiedis
2       kiedis
3         flea
4        smith
5  klinghoffer
6       slovak
7       slovak
...
...

I would like to understand why this does not work.

for(i in 1:4){
  RHCP_data[,i] <- separate(RHCP_data, paste0("V", i), into = c("a", "b"), sep = ":")[2][,1]
}

and I get this error:

Error: Invalid column specification

Obviously the dataset is small so it is not a problem in this situation but I feel there is something about tidyr or loops I don't understand. Any help appreciated.

Upvotes: 1

Views: 872

Answers (2)

akrun
akrun

Reputation: 887251

We can simply use cSplit without any loop.

library(splitstackshape)
DT <- cSplit(RHCP_data, 1:ncol(RHCP_data), ':')
DT[, seq(2, ncol(DT), by=2), with=FALSE]
#            V1_2       V2_2       V3_2   V4_2
#  1:      kiedis 01/01/1983 01/11/1986 vocals
#2:      kiedis 01/12/1986        end vocals
#3:        flea 01/01/1983        end   bass
#4:       smith 03/12/1988        end  drums
#5: klinghoffer 01/10/2009        end   lead
#6:      slovak 01/01/1983 01/12/1983   lead
#7:      slovak 01/02/1985 25/06/1988   lead

Upvotes: 3

Colonel Beauvel
Colonel Beauvel

Reputation: 31171

To pass columns as variable you need to use separate_ instead of separate.

And if you want to use a for loop, I would recommend:

lst = lapply(seq(ncol(df)), function(x) {
    separate_(df, paste0('V', x), into = paste0(c("a", "b"), x), sep = ":")[x:(x+1)][,2]
}) 

data.frame(setNames(lst, names(df)))
#           V1         V2         V3     V4
#1      kiedis 01/01/1983 01/11/1986 vocals
#2      kiedis 01/12/1986        end vocals
#3        flea 01/01/1983        end   bass
#4       smith 03/12/1988        end  drums
#5 klinghoffer 01/10/2009        end   lead
#6      slovak 01/01/1983 01/12/1983   lead
#7      slovak 01/02/1985 25/06/1988   lead

Upvotes: 3

Related Questions