user492922
user492922

Reputation: 935

How do I convert the type of a Dataframe column in Julia?

I have some data that is badly formatted. Specifically I have numeric columns that have some elements with spurious text in them (e.g. "8 meters" instead of "8"). I want to use readtable to read in the data, make the necessary fixes to the data and then convert the column to a Float64 so that it behaves correctly (comparison, etc).

There seems to have been a macro called @transform that would do the conversion but it has been deleted. How do I do this now?

My best solution at the moment is to clean up the data, write it out as a csv and then re-read it using readtable and specify eltypes. But that is horrible.

What else can I do?

Upvotes: 6

Views: 6963

Answers (2)

bsouthwood
bsouthwood

Reputation: 31

lets say you had a dataframe = df and a column B that has strings to convert.

First this converts a string to a float and returns NA if a failure:

string_to_float(str) = try convert(Float64, str) catch return(NA) end

Then transform that column:

df[:B] = map(string -> string_to_float string, df[:B])

an alternative shorter version

df[:B] = map(string_to_float, df[:B])

Upvotes: 3

Mr Alpha
Mr Alpha

Reputation: 1843

There is no need to run things via a csv file. You can change or update the DataFrame directly.

using DataFrames
# Lets make up some data
df=DataFrame(A=rand(5),B=["8", "9 meters", "4.5", "3m", "12.0"])

# And then make a function to clean the data
function fixdata(arr)
    result = DataArray(Float64, length(arr))
    reg = r"[0-9]+\.*[0-9]*"
    for i = 1:length(arr)
        m = match(reg, arr[i])
        if m == nothing
            result[i] = NA
        else
            result[i] = float64(m.match)
        end
    end
    result
end

# Then just apply the function to the column to clean the data
# and then replace the column with the cleaned data.
df[:B] = fixdata(df[:B])

Upvotes: 5

Related Questions