jxu
jxu

Reputation: 1

reorder the csv file by index column with predefined order

I'm trying to reorder the csv file I generated with python, and can't make it work,

The data is as:

metrics   test1 test2 test3 test4 ...
aa-key1
aa-key3
bb-key4
cc--key2
dd--key5
...

the keys are not in not sorted alphabetical nor numerical. It's just need to be as a predefined order [key1, key2, key3, key4, key5]. I added aa, cc, bb dd to illustrate the order I always got. It's there any way I can re-order the table by the first column, and the CSV would like

metrics   test1 test2 test3 test4 ...
aa-key1
cc--key2
aa-key3
bb-key4
dd--key5
...

I had generated the csv with panda, by appending the columns after creation. Although I defined the dictionary as orderedDict, the order doesn't seem to take affected.

def create_csv(infile, result):
    dict = parse_log_to_dict(infile)

    with open(result, 'w') as f:
        writer = csv.writer(f)
        writer.writerow(['metrics', get_file_name_only(infile)])
        for row in dict.items():
           writer.writerow(row)

I tried with:

  PREPERED_INDEX = [key1, key2, key3, key4, key5]
  reporter_df = pd.read_csv(result, index_col=0)
  reporter_df[0] = pd.Categorical(reporter_df[0], PREPERED_INDEX)
    reporter_df.sort_values('metrics')
    reporter_df.to_csv(result)

but got keyError on metrics, which is the title line I meant to be in the table.

Thanks!

Upvotes: 0

Views: 211

Answers (1)

Max Kaha
Max Kaha

Reputation: 922

So I created a csv with data similar to what you seem to have, I used the delimiter=";" because my .csv was seperated that way you likely dont need it:

df = pd.read_csv(r"D:\Dokumente\example.csv", delimiter=";")

Output:

    metrics  test1
0   key1         1
1   key4         4
2   key2         2
3   key5         5
4   key3         3

Then if you simply sort it by your column "metrics" it should do what you want to achieve:

df = df.sort_values("metrics")

Output:

  metrics  test1
0    key1      1
2    key2      2
4    key3      3
1    key4      4
3    key5      5

I believe by using index_col=0 you are using that entire column as index and it cant be accessed via the key "metrics" any longer. Atleast for me that seems to be the problem here when I try to reproduce it.

Additionally if you are worried about the messed up index you can then reset it in the next step with:

df = df.reset_index(drop=True)

Which then returns this for df:

  metrics  test1
0    key1      1
1    key2      2
2    key3      3
3    key4      4
4    key5      5

Hope this helps.

Upvotes: 1

Related Questions