Reputation: 935
I have some data that is badly formatted. Specifically I have numeric columns that have some elements with spurious text in them (e.g. "8 meters" instead of "8"). I want to use readtable to read in the data, make the necessary fixes to the data and then convert the column to a Float64 so that it behaves correctly (comparison, etc).
There seems to have been a macro called @transform that would do the conversion but it has been deleted. How do I do this now?
My best solution at the moment is to clean up the data, write it out as a csv and then re-read it using readtable and specify eltypes. But that is horrible.
What else can I do?
Upvotes: 6
Views: 6963
Reputation: 31
lets say you had a dataframe = df
and a column B
that has strings to convert.
First this converts a string to a float and returns NA if a failure:
string_to_float(str) = try convert(Float64, str) catch return(NA) end
Then transform that column:
df[:B] = map(string -> string_to_float string, df[:B])
an alternative shorter version
df[:B] = map(string_to_float, df[:B])
Upvotes: 3
Reputation: 1843
There is no need to run things via a csv file. You can change or update the DataFrame directly.
using DataFrames
# Lets make up some data
df=DataFrame(A=rand(5),B=["8", "9 meters", "4.5", "3m", "12.0"])
# And then make a function to clean the data
function fixdata(arr)
result = DataArray(Float64, length(arr))
reg = r"[0-9]+\.*[0-9]*"
for i = 1:length(arr)
m = match(reg, arr[i])
if m == nothing
result[i] = NA
else
result[i] = float64(m.match)
end
end
result
end
# Then just apply the function to the column to clean the data
# and then replace the column with the cleaned data.
df[:B] = fixdata(df[:B])
Upvotes: 5