Reputation: 17
I made a list of dataframes to apply the same function to all the dataframe elements.
first those are the libraries i used (i know that many of these will not actually be used in this code, but anyway)
library(ggplot2)
library(dplyr)
library(ggpmisc)
library(plotrix)
library(tidyverse)
library(lubridate)
library(ggrepel)
library(broom)
library(plotly)
library(reprex)
library(readxl)
library(zoo)
library(pracma)
i have about 100 CSV files in my working directory. file names are in the format of "LineNum_(number).csv"
list.files()
'0523_visualize.ipynb''LineNum_101.csv''LineNum_102.csv'(...)
each file looks like this:
df_Line311 <-read.csv("LineNum_311.csv", encoding = "UTF-8")
head(df_Line311, 5)
A data.frame: 5 × 5
Date On Off Transfer LineNum
<chr> <int> <int> <int> <int>
1 2020-01-02 15623 12250 3288 311
2 2020-01-03 16598 13078 3410 311
3 2020-01-04 12081 9771 2296 311
4 2020-01-05 9543 7556 1835 311
5 2020-01-06 14779 11607 3321 311
df_Line101 <-read.csv("LineNum_101.csv", encoding = "UTF-8")
head(df_Line101,5)
A data.frame: 5 × 5
Date On Off Transfer LineNum
<chr> <int> <int> <int> <int>
1 2020-01-02 4250 3725 1061 101
2 2020-01-03 4463 3910 1099 101
3 2020-01-04 3214 2847 753 101
4 2020-01-05 2977 2562 660 101
5 2020-01-06 4197 3673 1041 101
... and so on.
here On/Off/Transfer variables are the number of people those got on/off/transfered to the bus line LineNum. for example on 20-01-02, 15623 people got on the bus line 311.
well i successfully made a list of dataframes and applied the same function i made (function_merged(df)) to those elements,
my_list <- c("LineNum_101.csv", "LineNum_102.csv", "LineNum_103.csv")
my_df = lapply(my_list, function(x) read.csv(x, encoding = "UTF-8") )
lapply(my_df, function(x) function_Merged(x))
summary(my_df)
Length Class Mode
[1,] 5 data.frame list
[2,] 5 data.frame list
[3,] 5 data.frame list
my_df[1]
Date On Off Transfer LineNum Days Workdays On_RunMed NumericDate Loess_Fit Loess_SE
<date> <int> <int> <int> <int> <chr> <fct> <dbl> <dbl> <dbl> <dbl>
1 2020-01-02 4250 3725 1061 101 목요일 TRUE 4250 18263 4206.122 58.33396
6 2020-01-07 3980 3436 945 101 화요일 TRUE 4250 18268 4325.767 41.86618
7 2020-01-08 4382 3805 1052 101 수요일 TRUE 4382 18269 4344.186 39.22043
8 2020-01-09 4473 3957 1111 101 목요일 TRUE 4382 18270 4360.792 36.81093
2020-01-06 4197 3673 1041 101
but now i have another serious problem: i have to split all those dataframe elements into three phases based on the date, one by one. for example:
df_Line102 <-read.csv("LineNum_102.csv", encoding = "UTF-8")
df_102_Merged <- function_Merged(df_Line102)
# this df_102_Merged dataframe is manually made(not in the list)
# the shape is the same as the list elements
# and i have to do this to 100 data elements in my_list
df_102_Merged_Phase1 <-
df_102_Merged["2020-02-18" <= df_102_Merged$Date
& df_102_Merged$Date <= "2020-08-09",]
df_102_Merged_Phase2 <-
df_102_Merged["2020-08-10" <= df_102_Merged$Date
& df_102_Merged$Date <= "2020-11-17",]
df_102_Merged_Phase3 <-
df_102_Merged["2020-11-18" <= df_102_Merged$Date
& df_102_Merged$Date <= "2021-07-04",]
yes umm... help me. is there any way (new function or package) to split a dataframe and put it into another df list?
and i would very very appriciate if there's a way to export those dataframe elements in a list, all in a set df name format. (like df_102_phase3, df_301_phase2, ... etc. df_(linename)_phase(n) form.)
Upvotes: 0
Views: 56
Reputation:
You can generate dataframes of dataframes and nest
/unnest
(drill down into the "dataframelets" and back), together with the usual dataframe operations (e.g. summarising by groups) with little code like this:
library(dplyr)
library(tidyr)
data.frame(filename = c("LineNum_101.csv",
"LineNum_102.csv",
"LineNum_103.csv")) %>%
rowwise %>%
mutate(line_data = list(filename %>% read.csv)) %>%
unnest(line_data)
Upvotes: 0