Reputation: 471
Imagine you have these paths of files you want to get the filename without extension from:
relfilepath
0 20210322636.pdf
12 factuur-f23622.pdf
14 ingram micro.pdf
19 upfront.nl domein - Copy.pdf
21 upfront.nl domein.pdf
Name: relfilepath, dtype: object
I came up with the following however this gives me the problem that for the first item it becomes a number outputting '20210322636.0'.
from pathlib import Path
for i, row in dffinalselection.iterrows():
dffinalselection['xmlfilename'][i] = Path(dffinalselection['relfilepath'][i]).stem
dffinalselection['xmlfilename'] = dffinalselection['xmlfilename'].astype(str)
This is wrong since it should be '20210322636'
Please help!
Upvotes: 1
Views: 1802
Reputation: 95872
You were doing it correctly, but your operaiton on the dataframe was incorrect.
from pathlib import Path
for i, row in dffinalselection.iterrows():
dffinalselection['xmlfilename'][i] = Path(dffinalselection['relfilepath'][i]).stem # THIS WILL NOT RELIABLY MUTATE THE DATAFRAME
dffinalselection['xmlfilename'] = dffinalselection['xmlfilename'].astype(str) # THIS OVERWROTE EVERYTHING
Instead, just do:
from pathlib import Path
dffinalselection['xmlfilename'] = ''
for row in dffinalselection.itertuples():
dffinalselection.at[row.index, 'xmlfilename']= Path(row.relfilepath).stem
Or,
dffinalselection['xmlfilename'] = dffinalselection['relfilepath'].apply(lambda value: Path(value).stem)
Upvotes: 1
Reputation: 18406
If the column values are always the filename/filepath, split it from right on .
with maxsplit parameter as 1
and take the first value after splitting.
>>> df['relfilepath'].str.rsplit('.', n=1).str[0]
0 20210322636
12 factuur-f23622
14 ingram micro
19 upfront.nl domein - Copy
21 upfront.nl domein
Name: relfilepath, dtype: object
Upvotes: 2