cupoftea21
cupoftea21

Reputation: 31

Replace substring across Array of multiple data types in Julia

I have an array imported from a csv of multiple datatypes. I would like to remove all commas (,) and dollar signs ($). There are three columns with commas and dollar signs.

When creating a new array for a column with commas and dollar signs, I am able to do so successfully with below.

using CSV, DataFrames
df = DataFrame!(CSV.File("F:SampleFile.csv"))
dfmo = Array(df[!,30])
dfmo = collect(skipmissing(dfmo))
dfmo = replace.(dfmo,"\$"=>"")
dfmo = replace.(dfmo,","=>"")

When trying to apply across the entire vector with below

df=replace.(df,","=>"")

I get an error:

MethodError: no method matching similar(::Int64, ::Type{Any})
Closest candidates are:
  similar(!Matched::ZMQ.Message, ::Type{T}, !Matched::Tuple{Vararg{Int64,N}} where N) where T at C:\Users\

I then tried indexing with below and also get an error for indexing into a string.

for i in df
    for j in df
        if datatype(df[i,j]) == String
            df=replace(df[i,j],","=>"")
        end
    end
end
MethodError: no method matching similar(::Int64, ::Type{Any})
Closest candidates are:
  similar(!Matched::ZMQ.Message, ::Type{T}, !Matched::Tuple{Vararg{Int64,N}} where N) where T at C:\Users\

What is the most efficient way to replace substrings across an array of multiple datatypes?

Upvotes: 3

Views: 106

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

Seeing your code I understand you want an in-place operation (i.e. to change the original data frame).

Using the loop approach as in your code you can do this:

for col in axes(df,2)
    for row in axes(df, 1)
        cell = df[row, col]
        if cell isa AbstractString
            df[row, col] = replace(cell, "," => "")
        end
    end
end

Using broadcasting you can achieve the same with:

helper_fun(cell) = cell isa AbstractString ? replace(cell, "," => "") : cell

df .= helper_fun.(df)

Upvotes: 2

Related Questions