JooHee Lee
JooHee Lee

Reputation: 17

R splitting dataframes in a list

I made a list of dataframes to apply the same function to all the dataframe elements.

first those are the libraries i used (i know that many of these will not actually be used in this code, but anyway)

library(ggplot2)
library(dplyr)
library(ggpmisc)
library(plotrix)
library(tidyverse)
library(lubridate)
library(ggrepel)
library(broom)
library(plotly)
library(reprex)
library(readxl)
library(zoo)
library(pracma)

i have about 100 CSV files in my working directory. file names are in the format of "LineNum_(number).csv"

list.files()
'0523_visualize.ipynb''LineNum_101.csv''LineNum_102.csv'(...)

each file looks like this:

df_Line311 <-read.csv("LineNum_311.csv", encoding = "UTF-8")
head(df_Line311, 5)

A data.frame: 5 × 5
Date    On  Off Transfer    LineNum
<chr>   <int>   <int>   <int>   <int>
1   2020-01-02  15623   12250   3288    311
2   2020-01-03  16598   13078   3410    311
3   2020-01-04  12081   9771    2296    311
4   2020-01-05  9543    7556    1835    311
5   2020-01-06  14779   11607   3321    311
df_Line101 <-read.csv("LineNum_101.csv", encoding = "UTF-8")
head(df_Line101,5)
A data.frame: 5 × 5
Date    On  Off Transfer    LineNum
<chr>   <int>   <int>   <int>   <int>
1   2020-01-02  4250    3725    1061    101
2   2020-01-03  4463    3910    1099    101
3   2020-01-04  3214    2847    753 101
4   2020-01-05  2977    2562    660 101
5   2020-01-06  4197    3673    1041    101

... and so on.
here On/Off/Transfer variables are the number of people those got on/off/transfered to the bus line LineNum. for example on 20-01-02, 15623 people got on the bus line 311.

well i successfully made a list of dataframes and applied the same function i made (function_merged(df)) to those elements,

my_list <- c("LineNum_101.csv", "LineNum_102.csv", "LineNum_103.csv")
my_df = lapply(my_list, function(x) read.csv(x, encoding = "UTF-8") )
lapply(my_df, function(x) function_Merged(x))

summary(my_df)
     Length Class      Mode
[1,] 5      data.frame list
[2,] 5      data.frame list
[3,] 5      data.frame list

my_df[1]
    Date    On  Off Transfer    LineNum Days    Workdays    On_RunMed   NumericDate Loess_Fit   Loess_SE
<date>  <int>   <int>   <int>   <int>   <chr>   <fct>   <dbl>   <dbl>   <dbl>   <dbl>
1   2020-01-02  4250    3725    1061    101 목요일 TRUE    4250    18263   4206.122    58.33396
6   2020-01-07  3980    3436    945 101 화요일 TRUE    4250    18268   4325.767    41.86618
7   2020-01-08  4382    3805    1052    101 수요일 TRUE    4382    18269   4344.186    39.22043
8   2020-01-09  4473    3957    1111    101 목요일 TRUE    4382    18270   4360.792    36.81093
2020-01-06  4197    3673    1041    101

but now i have another serious problem: i have to split all those dataframe elements into three phases based on the date, one by one. for example:

df_Line102 <-read.csv("LineNum_102.csv", encoding = "UTF-8")
df_102_Merged <- function_Merged(df_Line102)
# this df_102_Merged dataframe is manually made(not in the list)
# the shape is the same as the list elements

# and i have to do this to 100 data elements in my_list
df_102_Merged_Phase1 <- 
df_102_Merged["2020-02-18" <= df_102_Merged$Date
& df_102_Merged$Date <= "2020-08-09",]

df_102_Merged_Phase2 <- 
df_102_Merged["2020-08-10" <= df_102_Merged$Date
& df_102_Merged$Date <= "2020-11-17",]

df_102_Merged_Phase3 <- 
df_102_Merged["2020-11-18" <= df_102_Merged$Date
& df_102_Merged$Date <= "2021-07-04",]

yes umm... help me. is there any way (new function or package) to split a dataframe and put it into another df list?

and i would very very appriciate if there's a way to export those dataframe elements in a list, all in a set df name format. (like df_102_phase3, df_301_phase2, ... etc. df_(linename)_phase(n) form.)

Upvotes: 0

Views: 56

Answers (1)

user18309711
user18309711

Reputation:

You can generate dataframes of dataframes and nest/unnest (drill down into the "dataframelets" and back), together with the usual dataframe operations (e.g. summarising by groups) with little code like this:

library(dplyr)
library(tidyr)

  data.frame(filename = c("LineNum_101.csv",
                          "LineNum_102.csv",
                          "LineNum_103.csv")) %>%
  rowwise %>%
  mutate(line_data = list(filename %>% read.csv)) %>%
  unnest(line_data)

Upvotes: 0

Related Questions