Reputation: 31
I have an array imported from a csv of multiple datatypes. I would like to remove all commas (,) and dollar signs ($). There are three columns with commas and dollar signs.
When creating a new array for a column with commas and dollar signs, I am able to do so successfully with below.
using CSV, DataFrames
df = DataFrame!(CSV.File("F:SampleFile.csv"))
dfmo = Array(df[!,30])
dfmo = collect(skipmissing(dfmo))
dfmo = replace.(dfmo,"\$"=>"")
dfmo = replace.(dfmo,","=>"")
When trying to apply across the entire vector with below
df=replace.(df,","=>"")
I get an error:
MethodError: no method matching similar(::Int64, ::Type{Any})
Closest candidates are:
similar(!Matched::ZMQ.Message, ::Type{T}, !Matched::Tuple{Vararg{Int64,N}} where N) where T at C:\Users\
I then tried indexing with below and also get an error for indexing into a string.
for i in df
for j in df
if datatype(df[i,j]) == String
df=replace(df[i,j],","=>"")
end
end
end
MethodError: no method matching similar(::Int64, ::Type{Any})
Closest candidates are:
similar(!Matched::ZMQ.Message, ::Type{T}, !Matched::Tuple{Vararg{Int64,N}} where N) where T at C:\Users\
What is the most efficient way to replace substrings across an array of multiple datatypes?
Upvotes: 3
Views: 106
Reputation: 69949
Seeing your code I understand you want an in-place operation (i.e. to change the original data frame).
Using the loop approach as in your code you can do this:
for col in axes(df,2)
for row in axes(df, 1)
cell = df[row, col]
if cell isa AbstractString
df[row, col] = replace(cell, "," => "")
end
end
end
Using broadcasting you can achieve the same with:
helper_fun(cell) = cell isa AbstractString ? replace(cell, "," => "") : cell
df .= helper_fun.(df)
Upvotes: 2