Reputation:
I already know how to load a single CSV into a DataFrame:
using CSV
using DataFrames
df = DataFrame(CSV.File("C:\\Users\\username\\Table_01.csv"))
How would I do this when I have several CSV files, e.g. Table_01.csv
, Table_02.csv
, Table_03.csv
?
Would I create a bunch of empty DataFrames and use a for loop to fill them? Or is there an easier way in Julia? Many thanks in advance!
Upvotes: 5
Views: 1874
Reputation: 1
An example of open write and close process. Reading is similar too.
function main()
f_max=365
data=zeros(Float64,100,f_max)
data[:,:].=rand()
filenames=[]
for i=1:f_max
ci=string(i)
filename="./testdata"*ci*".dat"
push!(filenames,filename)
end
files = [open(file,"w") for file in filenames]
for i=1:f_max
write(files[i],data[:,i])
end
#println(odata[1,1]," ",odata[1,2])
for i=1:f_max
close(files[i])
end
end
main()
Upvotes: 0
Reputation: 35
A simple solution where you don't have to explicitly enter filenames:
using CSV, Glob, DataFrames
path = raw"C:\..." # directory of your files (raw is useful in Windows to add a \)
files=glob("*.csv", path) # to load all CSVs from a folder (* means arbitrary pattern)
dfs = DataFrame.( CSV.File.( files ) ) # creates a list of dataframes
# add an index column to be able to later discern the different sources
for i in 1:length(dfs)
dfs[i][!, :sample] .= i # I called the new col sample
end
# finally, if you want, reduce your collection of dfs via vertical concatenation
df = reduce(vcat, dfs)
Upvotes: 0
Reputation: 1474
This is how I have done it, but there might be an easier way.
using DataFrames, Glob
import CSV
function readcsvs(path)
files=glob("*.csv", path) #Vector of filenames. Glob allows you to use the asterisk.
numfiles=length(files) #Number of files to read.
tempdfs=Vector{DataFrame}(undef, numfiles) #Create a vector of empty dataframes.
for i in 1:numfiles
tempdfs[i]=CSV.read(files[i]) #Read each CSV into its own dataframe.
end
masterdf=outerjoin(tempdfs..., on="Column In Common") #Join the temporary dataframes into one dataframe.
end
Upvotes: 3
Reputation: 69839
If you want multiple data frames (not a single data frame holding the data from multiple files) there are several options.
Let me start with the simplest approach using broadcasting:
dfs = DataFrame.(CSV.File.(["Table_01.csv", "Table_02.csv", "Table_03.csv"]))
or
dfs = @. DataFrame(CSV.File(["Table_01.csv", "Table_02.csv", "Table_03.csv"]))
or (with a bit of more advanced stuff, using function composition):
(DataFrame∘CSV.File).(["Table_01.csv", "Table_02.csv", "Table_03.csv"])
or using chaining:
CSV.File.(["Table_01.csv", "Table_02.csv", "Table_03.csv"]) .|> DataFrame
Now other options are map
as it was suggested in the comment:
map(DataFrame∘CSV.File, ["Table_01.csv", "Table_02.csv", "Table_03.csv"])
or just use a comprehension:
[DataFrame(CSV.File(f)) for f in ["Table_01.csv", "Table_02.csv", "Table_03.csv"]]
(I am listing the options to show different syntactic possibilities in Julia)
Upvotes: 5