Reputation: 1168
I have a list of files:
files=['D_12-09, batch_1, d_250, T_300, XV_40, I_100-100, C_1.dat',
'D_12-09, batch_1, d_250, T_300, XV_40, I_100-500, C_1, N_after-rest.dat',
'D_12-09, batch_1, d_350, T_180, XV_150, I_100-500, C_1.dat']
From which I am extracting information encoded in the names:
dict_of_titles=[dict(item.split("_") for item in file.split(", ")) for file in files] #https://stackoverflow.com/questions/186857/splitting-a-semicolon-separated-string-to-a-dictionary-in-python
df=pd.DataFrame.from_dict(dict_of_titles)
Creatinig this dataframe:
C D I N T XV batch d
0 1.dat 12-09 100-100 NaN 300 40 1 250
1 1 12-09 100-500 after-rest.dat 300 40 1 250
2 1.dat 12-09 100-500 NaN 180 150 1 350
However I want to also have a column 'files' to the dataframe storing the filename corresponding to the row, example
C D I N T XV batch d files
0 1.dat 12-09 100-100 NaN 300 40 1 250 'D_12-09, batch_1, d_250, T_300, XV_40, I_100-100, C_1.dat',
1 1 12-09 100-500 after-rest.dat 300 40 1 250 'D_12-09, batch_1, d_250, T_300, XV_40, I_100-500, C_1, N_after-rest.dat',
2 1.dat 12-09 100-500 NaN 180 150 1 350 'D_12-09, batch_1, d_350, T_180, XV_150, I_100-500, C_1.dat'
I am thinking of suing the sort function on the list files and then just appending files as a column:
files.sort()
dict_of_titles=[dict(item.split("_") for item in file.split(", ")) for file in files] #https://stackoverflow.com/questions/186857/splitting-a-semicolon-separated-string-to-a-dictionary-in-python
df=pd.DataFrame.from_dict(dict_of_titles)
df['files']=files.
Does this guarantee that the files will be parsed in the right order?
Upvotes: 0
Views: 40
Reputation: 149075
A list maintains its order. That means that whether is is sorted (according to a particular key) or not, it will always be scanned in the same order.
So whatever the list order, this will guarantee a correct row alignment:
dict_of_titles=[dict(item.split("_") for item in file.split(", ")) for file in files]
df=pd.DataFrame.from_dict(dict_of_titles)
df['files']=files
Using sort
will give a specific order to the list (and to the rows of the dataframe) and the above code will still be valid.
Upvotes: 1