Reputation: 111
I have some data on streets and I ran this R code to get the content of 38 csv files in a list (more files will be added in the future) :
common_path <- "0_data/source_data/DB/Speed/"
csv_files <- list.files(
path = common_path, # directory to search within
pattern = ".*(1|2).*csv$", #
recursive = TRUE, # search subdirectories
full.names = TRUE # return the full path
)
data_lst = lapply(csv_files, read.csv2)
Their heads that look like this:
Here is the head of the data frame in a reproducible format:
structure(list(typ = c(100L, 100L, 100L, 100L, 100L, 100L, 100L,
100L, 100L, 1L, 1L, 1L, 1L, 1L, 1L), date.and.time = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("2019/11/07 18:07:27.000",
"2019/11/07 18:07:36.290", "2019/11/07 18:07:40.030", "2019/11/07 18:07:41.930",
"2019/11/07 18:07:43.720", "2019/11/07 18:07:46.380", "2019/11/07 18:07:54.010"
), class = "factor"), speed..km.h. = c(NA, NA, NA, NA, NA, NA,
NA, NA, NA, 42L, 44L, 43L, 42L, 41L, 43L), length..m. = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, 3.2, 4.2, 3.2, 3.9, 3.7, 3.2),
range..m. = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, 0L,
0L, 0L, 0L, 0L), notes = c("Serial No = 1", "Direction = NORTH",
"Counting type = SINGLE LANE", "Ref count sense = IN", "Install height = 42 decimeter",
"Axis distance = 58 decimeter", "Road type = STANDARD", "Road slope = FLAT",
"Start of campain", "", "", "", "", "", "")), row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15"), class = "data.frame")
What I want to do is:
Get the information of the first 9 rows from the "notes" column
Add information from the "notes" column as seperate variables
After that delete the first 9 rows or baiscally all rows where the columns "typ" == 100
I can do this by hand for the object in the list with no problem, as in the code below:
data_lst[[1]]$serial <- data_lst[[1]]$notes[1]
data_lst[[1]]$direction <- data_lst[[1]]$notes[2]
data_lst[[1]]$lane <- data_lst[[1]]$notes[3]
data_lst[[1]]$install_height <- data_lst[[1]]$notes[5]
data_lst[[1]]$axis <- data_lst[[1]]$notes[6]
data_lst[[1]]$notes <- NULL
data_lst[[1]] <- data_lst[[1]][-c(1:9),]
But problems arise when I try to loop this process, as I am very inexperienced with loops. I did something like this,
for(i in data_lst){
data_lst[[i]]$serial <- data_lst[[i]]$notes[1]
}
to obtain the "serial" information from my data, but I got this error:
error:
in data_lst[[i]] : invalid subscript type 'list'
Any help is warmly welcome :)
Upvotes: 2
Views: 1982
Reputation: 173793
If you want to do something fairly complex for each entry in a list, it is a good idea to seperate out the logic you wish to apply to each entry by writing a function. This makes your code more readable, more modular, and easier to debug or modify in the future.
In your case, you could write a function to operate on each data frame in your list to create a named list of different components: all the named notes you want, plus your modified data frame. Perhaps something like this:
change_data_frame_to_named_list <- function(old_frame)
{
return(list(serial = old_frame$notes[1],
direction = old_frame$notes[2],
lane = old_frame$notes[3],
install_height = old_frame$notes[5],
xaxis = old_frame$notes[6],
data = old_frame[-which(old_frame$type == 100), -6]
))
}
Now all you have to do is apply this function to all the elements in your list. The most idiomatic way to do this in R is not to use a loop at all, but to use lapply
(short for list apply). This takes the list as the first argument, and the function you wish to apply to each element as the second argument.
This means you can just do this:
result <- lapply(data_lst, change_data_frame_to_named_list)
This is equivalent to a looped version, but is shorter and neater.
If you really want to do it as a loop, the equivalent would be:
result <- list()
for (i in seq_along(data_lst))
{
result[[i]] = change_data_frame_to_named_list(data_lst[[i]])
}
In either case, the variable result
is a list of the same length as data_lst
, where each entry is itself a named list, containing your new data frame and its associated named notes.
EDIT
The OP has requested a similar method that returns data in the format he has already made with his hand-written loop. Here is how this could be accomplished. Since the logic was seperated out into the function, we only need to change the function itself:
change_data_frame <- function(old_frame)
{
old_frame$serial <- old_frame$notes[1]
old_frame$direction <- old_frame$notes[2]
old_frame$lane <- old_frame$notes[3]
old_frame$install_height <- old_frame$notes[5]
old_frame$xaxis <- old_frame$notes[6]
old_frame$notes <- NULL
return(old_frame[-which(old_frame$typ == 100),])
}
# Now you just do as you did before
result <- lapply(data_lst, change_data_frame)
# and to get all dfs into one big data frame...
do.call("rbind", result)
Upvotes: 1
Reputation: 179
in a for-loop, you always have to specify where the loop should start and where it should end. You want to iterate through each element in the list, which means you need for (i in seq_along(data_lst))
.
Running seq_along(data_lst)
will create a sequence from 1 to the number of elements in the list.
Upvotes: 1
Reputation: 1372
The way you are doing the for loop, i
is a list, not an index. If you want the index, you should use the function seq_along
, to return a vector of indices for your list. Check this example:
> l = list("apple", "banana", "carrot")
> l
[[1]]
[1] "apple"
[[2]]
[1] "banana"
[[3]]
[1] "carrot"
> for(p in l) print(p)
[1] "apple"
[1] "banana"
[1] "carrot"
> for(i in seq_along(l)) print(i)
[1] 1
[1] 2
[1] 3
Upvotes: 1