Reputation: 3318
I have a list of "pickle" files (see Image1). I want to use the name of the file as an index in Pandas. But so far I have all the path (which is long) + the file's name.
I have found this link: How to get the filename without the extension from a path in Python?
The answer is using ".stem" somewhere in my code. But I just do not know where. and my files do not have an extension.
import pandas as pd
import glob
from pathlib import Path
# This is the path to the folder which contains all the "pickle" files
dir_path = Path(r'C:\Users\OneDrive\Projects\II\Coral\Classification\inference_time')
files = dir_path.glob('**/file_inference_time*')
df_list = list() #This is an empty list
for file in files:
df = pd.DataFrame(pd.read_pickle(file)) #storing the "pickle" files in a dataframe
df_list['file'] = file #creating a column 'file' which has the path + file
df_list.append(df) #sending all dataframes into a list
df_list_all = pd.concat(df_list).reset_index(drop=True) #merging all dataframes into a single one
df_list_all
THIS IS WHAT I GET:
Inference_Time file
0 2.86 C:\Users\OneDrive\Projects\Classification\inference_time\inference_time_InceptionV1
1 30.96 C:\Users\OneDrive\Projects\Classification\inference_time\inference_time_mobileNetV2
2 11.04 C:\Users\OneDrive\Projects\Classification\inference_time\inference_time_efficientNet
I WANT THIS:
Inference_Time file
InceptionV1 2.86 C:\Users\OneDrive\Projects\Classification\inference_time\inference_time_InceptionV1
mobilenetV2 30.96 C:\Users\OneDrive\Projects\Classification\inference_time\inference_time_mobileNetV2
efficientNet 11.04 C:\Users\OneDrive\Projects\Classification\inference_time\inference_time_efficientNet
IMAGE 1
Upvotes: 0
Views: 899
Reputation: 2553
Check out pandas-path
which gives you a .path
accessor on Series that exposes all of the normal pathlib
methods and properties.
import pandas as pd
from pandas_path import path
# can be windows paths; only posix paths because i am on posix machine
data = [
("folder/inference_time_InceptionV1", 10),
("folder2/inference_time_mobileNetV2", 20),
("folder4/inference_time_efficientNet", 30),
]
df = pd.DataFrame(data, columns=['file', 'time'])
(
df.file.path.name # use path accessor from pandas_path to get just the filename
.str.split('_') # split into components based on "_"
.str[-1] # select last component
)
#> 0 InceptionV1
#> 1 mobileNetV2
#> 2 efficientNet
#> Name: file, dtype: object
Created at 2021-03-06 10:57:59 PST by reprexlite v0.4.2
Upvotes: 1
Reputation: 34046
You can transform your output to this:
In [1603]: df
Out[1603]:
Inference_Time file
0 2.86 C:\Users\OneDrive\Projects\Classification\infe...
1 30.96 C:\Users\OneDrive\Projects\Classification\infe...
2 11.04 C:\Users\OneDrive\Projects\Classification\infe...
In [1607]: df = df.set_index(df['file'].str.split('inference_time_').str[-1])
In [1610]: del df.index.name
In [1608]: df
Out[1608]:
Inference_Time file
InceptionV1 2.86 C:\Users\OneDrive\Projects\Classification\infe...
mobileNetV2 30.96 C:\Users\OneDrive\Projects\Classification\infe...
efficientNet 11.04 C:\Users\OneDrive\Projects\Classification\infe...
Upvotes: 1