user7295926
user7295926

Reputation:

How do I load multiple CSV into DataFrames in Julia?

I already know how to load a single CSV into a DataFrame:

using CSV
using DataFrames    
df = DataFrame(CSV.File("C:\\Users\\username\\Table_01.csv"))

How would I do this when I have several CSV files, e.g. Table_01.csv, Table_02.csv, Table_03.csv? Would I create a bunch of empty DataFrames and use a for loop to fill them? Or is there an easier way in Julia? Many thanks in advance!

Upvotes: 5

Views: 1874

Answers (4)

kominy
kominy

Reputation: 1

An example of open write and close process. Reading is similar too.

    function main()
    f_max=365
    data=zeros(Float64,100,f_max)
    data[:,:].=rand()

    filenames=[]
    for i=1:f_max
        ci=string(i)
         filename="./testdata"*ci*".dat"
         push!(filenames,filename)
     end
     

    files = [open(file,"w") for file in filenames]


    for i=1:f_max
        write(files[i],data[:,i])
    end

    #println(odata[1,1]," ",odata[1,2])

    for i=1:f_max
        close(files[i])
    end

    end

    main()

Upvotes: 0

hooman
hooman

Reputation: 35

A simple solution where you don't have to explicitly enter filenames:

using CSV, Glob, DataFrames
path = raw"C:\..." # directory of your files (raw is useful in Windows to add a \)
files=glob("*.csv", path) # to load all CSVs from a folder (* means arbitrary pattern)
dfs = DataFrame.( CSV.File.( files ) ) # creates a list of dataframes

# add an index column to be able to later discern the different sources
for i in 1:length(dfs)
    dfs[i][!, :sample] .= i # I called the new col sample
end

# finally, if you want, reduce your collection of dfs via vertical concatenation
df = reduce(vcat, dfs)

Upvotes: 0

Nathan Boyer
Nathan Boyer

Reputation: 1474

This is how I have done it, but there might be an easier way.

using DataFrames, Glob
import CSV

function readcsvs(path)
    files=glob("*.csv", path) #Vector of filenames. Glob allows you to use the asterisk.
    numfiles=length(files)    #Number of files to read.
    tempdfs=Vector{DataFrame}(undef, numfiles) #Create a vector of empty dataframes.
    for i in 1:numfiles
        tempdfs[i]=CSV.read(files[i]) #Read each CSV into its own dataframe.
    end
    masterdf=outerjoin(tempdfs..., on="Column In Common") #Join the temporary dataframes into one dataframe.
end

Upvotes: 3

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69839

If you want multiple data frames (not a single data frame holding the data from multiple files) there are several options.

Let me start with the simplest approach using broadcasting:

dfs = DataFrame.(CSV.File.(["Table_01.csv", "Table_02.csv", "Table_03.csv"]))

or

dfs = @. DataFrame(CSV.File(["Table_01.csv", "Table_02.csv", "Table_03.csv"]))

or (with a bit of more advanced stuff, using function composition):

(DataFrame∘CSV.File).(["Table_01.csv", "Table_02.csv", "Table_03.csv"])

or using chaining:

CSV.File.(["Table_01.csv", "Table_02.csv", "Table_03.csv"]) .|> DataFrame

Now other options are map as it was suggested in the comment:

map(DataFrame∘CSV.File, ["Table_01.csv", "Table_02.csv", "Table_03.csv"])

or just use a comprehension:

[DataFrame(CSV.File(f)) for f in ["Table_01.csv", "Table_02.csv", "Table_03.csv"]]

(I am listing the options to show different syntactic possibilities in Julia)

Upvotes: 5

Related Questions