Leo
Leo

Reputation: 1168

Pandas: does sort guarantee the order a function is exectued on list?

I have a list of files:

files=['D_12-09, batch_1, d_250, T_300, XV_40, I_100-100, C_1.dat',
 'D_12-09, batch_1, d_250, T_300, XV_40, I_100-500, C_1, N_after-rest.dat',
 'D_12-09, batch_1, d_350, T_180, XV_150, I_100-500, C_1.dat']

From which I am extracting information encoded in the names:

dict_of_titles=[dict(item.split("_") for item in file.split(", ")) for file in files] #https://stackoverflow.com/questions/186857/splitting-a-semicolon-separated-string-to-a-dictionary-in-python
df=pd.DataFrame.from_dict(dict_of_titles)

Creatinig this dataframe:

       C      D        I               N    T   XV batch    d
0  1.dat  12-09  100-100             NaN  300   40     1  250
1      1  12-09  100-500  after-rest.dat  300   40     1  250
2  1.dat  12-09  100-500             NaN  180  150     1  350


However I want to also have a column 'files' to the dataframe storing the filename corresponding to the row, example

       C      D        I               N    T   XV batch    d files
0  1.dat  12-09  100-100             NaN  300   40     1  250 'D_12-09, batch_1, d_250, T_300, XV_40, I_100-100, C_1.dat',
1      1  12-09  100-500  after-rest.dat  300   40     1  250 'D_12-09, batch_1, d_250, T_300, XV_40, I_100-500, C_1, N_after-rest.dat',
2  1.dat  12-09  100-500             NaN  180  150     1  350 'D_12-09, batch_1, d_350, T_180, XV_150, I_100-500, C_1.dat'

I am thinking of suing the sort function on the list files and then just appending files as a column:

files.sort()
dict_of_titles=[dict(item.split("_") for item in file.split(", ")) for file in files] #https://stackoverflow.com/questions/186857/splitting-a-semicolon-separated-string-to-a-dictionary-in-python
df=pd.DataFrame.from_dict(dict_of_titles)
df['files']=files.

Does this guarantee that the files will be parsed in the right order?

Upvotes: 0

Views: 40

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 149075

A list maintains its order. That means that whether is is sorted (according to a particular key) or not, it will always be scanned in the same order.

So whatever the list order, this will guarantee a correct row alignment:

dict_of_titles=[dict(item.split("_") for item in file.split(", ")) for file in files]
df=pd.DataFrame.from_dict(dict_of_titles)
df['files']=files

Using sort will give a specific order to the list (and to the rows of the dataframe) and the above code will still be valid.

Upvotes: 1

Related Questions