liebestod
liebestod

Reputation: 15

Stripping strings in order to sort rows in Dataframe

I have a master CSV file that looks something like this -- it's only 1 column.

column_name
string1010string
string1013string
string1014string
string1015string
string1016string
string1018string
string1020string

Then I have a temporary CSV that I would like to keep track of separtely but also merge it with the master CSV and have it sorted in ascending order only taking the integers into consideration. I am aware that I have to strip the strings (from the start & the end) of each row to isolate the integers and then sort but I'm not quite sure how to approach it after that.

column_name
string1011string
string1012string
string1017string
string1019string

My function looks something like this:

def output_master_concatenated(list1, list2):
    master= pd.concat([list1, list2])
    #sorting_system
    master.to_csv(str('master' + '.csv'), index = False, sep = ' ')
    return master

Ideally, this is what I would like it to look like.

column_name
string1010string
string1011string
string1012string
string1013string
string1014string
string1015string
string1016string
string1017string
string1018string
string1019string
string1020string

update: string(integer)string is actually a link, each row is basically the same link with only the integer changing

Upvotes: 1

Views: 96

Answers (1)

Corralien
Corralien

Reputation: 120469

Use sort_values with a custom key:

df = pd.concat([df1, df2])

num_sort = lambda x: x.str.extract('(\d+)', expand=False).str.zfill(10)
df = df.sort_values('column_name', key=num_sort, ignore_index=True)
print(df)

# Output
         column_name
0   string1010string
1   string1011string
2   string1012string
3   string1013string
4   string1014string
5   string1015string
6   string1016string
7   string1017string
8   string1018string
9   string1019string
10  string1020string

Upvotes: 1

Related Questions