clearseplex
clearseplex

Reputation: 729

Reading CSV file in loop Dataframe (Julia)

I want to read multiple CSV files with changing names like "CSV_1.csv" and so on. My idea was to simply implement a loop like the following

using CSV
for i = 1:8
    a[i] = CSV.read("0.$i.csv")
end

but obviously that won't work. Is there a simple way of implementing this, like introducing a additional dimension in the dataframe?

Upvotes: 2

Views: 1272

Answers (3)

hooman
hooman

Reputation: 35

You can read an arbitrary number of CSV files with a certain pattern in the file name, create a dataframe per file and lastly, if you want, create a single dataframe.

using CSV, Glob, DataFrames
path = raw"C:\..." # directory of your files (raw is useful in Windows to add a \)
files=glob("*.csv", path) # to load all CSVs from a folder (* means arbitrary pattern)
dfs = DataFrame.( CSV.File.( files ) ) # creates a list of dataframes

# add an index column to be able to later discern the different sources
for i in 1:length(dfs)
    dfs[i][!, :sample] .= i # I called the new col sample
end

# finally, reduce your collection of dfs via vertical concatenation
df = reduce(vcat, dfs)

Upvotes: 0

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69869

You can alternatively to what Kevin proposed write:

# read in the files into a vector
a = CSV.read.(["0.$i.csv" for i in 1:8])

# add an indicator column
for i in 1:8
    a[i][!, :id] .= i
end

# create a single data frame with indicator column holding the source
b = reduce(vcat, a)

Upvotes: 1

kevbonham
kevbonham

Reputation: 1040

Assuming a in this case is an array, this is definitely possible, but to do it this way, you'd need to pre-allocate your array, since you can't assign an index that doesn't exist yet:

julia> a = []
0-element Array{Any,1}

julia> a[1] = 1
ERROR: BoundsError: attempt to access 0-element Array{Any,1} at index [1]
Stacktrace:
 [1] setindex!(::Array{Any,1}, ::Any, ::Int64) at ./essentials.jl:455
 [2] top-level scope at REPL[10]:1

julia> a2 = Vector{Int}(undef, 5);

julia> for i in 1:5
           a2[i] = i
       end

julia> a2
5-element Array{Int64,1}:
 1
 2
 3
 4
 5

Alternatively, you can use push!() to add things to an array as you need.

julia> a3 = [];

julia> for i in 1:5
           push!(a3, i)
       end

julia> a3
5-element Array{Any,1}:
 1
 2
 3
 4
 5

So for your CSV files,

using CSV

a = []

for i = 1:8
    push!(a, CSV.read("0.$i.csv"))
end

Upvotes: 3

Related Questions