Reputation: 391
I have multiple tar files that in each there are multiple csv files. I want to open all csv files as a vaex dataframe
and then make a new column
with lambda
function but I got bellow error. How can I do it?
def get_years_files(num_years):
files = os.listdir("myfiles")
years = [int(re.findall('\d+',file)[0]) for file in files]
return years
def process(yearfiles):
lst = []
for yearfile in yearfiles:
tar = tarfile.open("myfiles/" + yearfile, "r")
for member in tar:
if ".csv" in member.name:
vx = vaex.from_csv(io.BytesIO(tar.extractfile(member).read()))
vx['WMO'] = vx['STATION'].apply(lambda x: str(x)[:-5])
lst.append(vx)
tar.close()
df_vx = vaex.concat(lst)
return df_vx
Error:
ValueError: Unequal function lambda_function in concatenated dataframes are not supported yet
Upvotes: 0
Views: 167
Reputation: 10789
The error message is pretty clear. You can't use lambdas with concatenated dataframes. But you can use a standard function:
def f(x):
return str(x)[:-5]
...
vx['STATION'].apply(f)
Upvotes: 0