Reputation: 1925
I am new to R, so it may be that some of concepts are not fully correct... I have a set of files that I read into a list (here just shown the first 3 lines of each):
myfiles<-lapply(list.files(".",pattern="tab",full.names=T),read.table,skip="#")
myfiles
[[1]]
V1 V2 V3
1 10001 33 -0.0499469
2 30001 65 0.0991478
3 50001 54 0.1564400
[[2]]
V1 V2 V3
1 10001 62 0.0855260
2 30001 74 0.1536640
3 50001 71 0.1020960
[[3]]
V1 V2 V3
1 10001 49 -0.04661360
2 30001 65 0.16961500
3 50001 61 0.07089600
I want to apply an ifelse condition in order to substitute values in columns and then return exactly the same list. However, when I do this:
myfiles<-lapply(myfiles,function(x) ifelse(x$V2>50, x$V3, NA))
myfiles
[[1]]
[1] NA 0.0991478 0.1564400
[[2]]
[1] 0.0855260 0.1536640 0.1020960
[[3]]
[1] NA 0.16961500 0.07089600
it does in fact what I want to, but returns only the columns where the function was applied, and I want it to return the same list as before, with 3 columns (but with the substitutions).
I guess there should be an easy way to do this with some variant of "apply", but I was not able to find it or solve it.
Thanks
Upvotes: 2
Views: 6417
Reputation: 24535
Try adding an id column for each df and binding them together:
for(i in 1:3) myfiles[[i]]$id = i
ddf = myfiles[[1]]
for(i in 2:3) ddf = rbind(ddf, myfiles[[i]])
Then apply changes on composite df and split it back again:
ddf$V3 = ifelse(ddf$V2>50, ddf$V3, NA)
myfiles = lapply(split(ddf, ddf$id), function(x) x[1:3])
myfiles
$`1`
V1 V2 V3
1 10001 33 NA
2 30001 65 0.0991478
3 50001 54 0.1564400
$`2`
V1 V2 V3
11 10001 62 0.085526
21 30001 74 0.153664
31 50001 71 0.102096
$`3`
V1 V2 V3
12 10001 49 NA
22 30001 65 0.169615
32 50001 61 0.070896
Upvotes: 0
Reputation: 886978
Perhaps this helps
lapply(myfiles,within, V3 <- ifelse(V2 >50, V3, NA))
#[[1]]
# V1 V2 V3
#1 10001 33 NA
#2 30001 65 0.0991478
#3 50001 54 0.1564400
#[[2]]
# V1 V2 V3
#1 10001 62 0.085526
#2 30001 74 0.153664
#3 50001 71 0.102096
#[[3]]
# V1 V2 V3
#1 10001 49 NA
#2 30001 65 0.169615
#3 50001 61 0.070896
Another option would be to read the files using fread
from data.table
which would be fast
library(data.table)
files <- list.files(pattern='tab')
lapply(files, function(x) fread(x)[V2<=50,V3:=NA] )
#[[1]]
# V1 V2 V3
#1: 10001 33 NA
#2: 30001 65 0.0991478
#3: 50001 54 0.1564400
#[[2]]
# V1 V2 V3
#1: 10001 62 0.085526
#2: 30001 74 0.153664
#3: 50001 71 0.102096
#[[3]]
# V1 V2 V3
#1: 10001 49 NA
#2: 30001 65 0.169615
#3: 50001 61 0.070896
Or as @Richie Cotton mentioned, you could also bind the datasets together using rbindlist
and then do the operation in one step.
library(tools)
dt1 <- rbindlist(lapply(files, function(x)
fread(x)[,id:= basename(file_path_sans_ext(x))] ))[V2<=50, V3:=NA]
dt1
# V1 V2 V3 id
#1: 10001 33 NA tab1
#2: 30001 65 0.0991478 tab1
#3: 50001 54 0.1564400 tab1
#4: 10001 62 0.0855260 tab2
#5: 30001 74 0.1536640 tab2
#6: 50001 71 0.1020960 tab2
#7: 10001 49 NA tab3
#8: 30001 65 0.1696150 tab3
#9: 50001 61 0.0708960 tab3
Upvotes: 3
Reputation: 121057
This seems harder than it should be because you are working with a list of data frames rather than a single data frame. You can combine all the data frames into a single one using rbind_all
in dplyr
.
library(dplyr)
# Some variable renaming for clarity:
# myfiles now refers to the file names; mydata now contains the data
myfiles <- list.files(pattern="tab", full.names=TRUE)
mydata <- lapply(myfiles, read.table, skip="#")
# Get the number of rows in each data frame
n_rows <- vapply(mydata, nrow, integer(1))
# Combine the list of data frames into a single data frame
all_mydata <- rbind_all(mydata)
# Add an identifier to see which data frame the row came from.
all_mydata$file <- rep(myfiles, each = n_rows)
# Now update column 3
is.na(all_mydata$V3) <- all_mydata$V2 < 50
Upvotes: 1
Reputation: 81683
You can use lapply
and transform
/within
. There are three possibilities:
a) ifelse
lapply(myfiles, transform, V3 = ifelse(V2 > 50, V3, NA))
b) mathematical operators (potentially more efficient)
lapply(myfiles, transform, V3 = NA ^ (V2 <= 50) * V3)
c) is.na<-
lapply(myfiles, within, is.na(V3) <- V2 < 50)
The result
[[1]]
V1 V2 V3
1 10001 33 NA
2 30001 65 0.0991478
3 50001 54 0.1564400
[[2]]
V1 V2 V3
1 10001 62 0.085526
2 30001 74 0.153664
3 50001 71 0.102096
[[3]]
V1 V2 V3
1 10001 49 NA
2 30001 65 0.169615
3 50001 61 0.070896
Upvotes: 3