HMadadi
HMadadi

Reputation: 391

Lambda function in concatenated dataframes in Vaex

I have multiple tar files that in each there are multiple csv files. I want to open all csv files as a vaex dataframe and then make a new column with lambda function but I got bellow error. How can I do it?

def get_years_files(num_years):
    files = os.listdir("myfiles")
    years = [int(re.findall('\d+',file)[0]) for file in files]
    return years

def process(yearfiles):  
    lst = []
    for yearfile in yearfiles: 
        tar = tarfile.open("myfiles/" + yearfile, "r")
        for member in tar:
            if ".csv" in member.name:
                vx = vaex.from_csv(io.BytesIO(tar.extractfile(member).read()))
                vx['WMO'] = vx['STATION'].apply(lambda x: str(x)[:-5])
                lst.append(vx)

        tar.close()
    df_vx = vaex.concat(lst)
    return df_vx

Error:

ValueError: Unequal function lambda_function in concatenated dataframes are not supported yet

Upvotes: 0

Views: 167

Answers (1)

alec_djinn
alec_djinn

Reputation: 10789

The error message is pretty clear. You can't use lambdas with concatenated dataframes. But you can use a standard function:

def f(x):
    return str(x)[:-5]

...

vx['STATION'].apply(f)

Upvotes: 0

Related Questions