Reputation: 1
I'm trying to reorder the csv file I generated with python, and can't make it work,
The data is as:
metrics test1 test2 test3 test4 ...
aa-key1
aa-key3
bb-key4
cc--key2
dd--key5
...
the keys are not in not sorted alphabetical nor numerical. It's just need to be as a predefined order [key1, key2, key3, key4, key5]. I added aa, cc, bb dd to illustrate the order I always got. It's there any way I can re-order the table by the first column, and the CSV would like
metrics test1 test2 test3 test4 ...
aa-key1
cc--key2
aa-key3
bb-key4
dd--key5
...
I had generated the csv with panda, by appending the columns after creation. Although I defined the dictionary as orderedDict, the order doesn't seem to take affected.
def create_csv(infile, result):
dict = parse_log_to_dict(infile)
with open(result, 'w') as f:
writer = csv.writer(f)
writer.writerow(['metrics', get_file_name_only(infile)])
for row in dict.items():
writer.writerow(row)
I tried with:
PREPERED_INDEX = [key1, key2, key3, key4, key5]
reporter_df = pd.read_csv(result, index_col=0)
reporter_df[0] = pd.Categorical(reporter_df[0], PREPERED_INDEX)
reporter_df.sort_values('metrics')
reporter_df.to_csv(result)
but got keyError on metrics, which is the title line I meant to be in the table.
Thanks!
Upvotes: 0
Views: 211
Reputation: 922
So I created a csv with data similar to what you seem to have, I used the delimiter=";" because my .csv was seperated that way you likely dont need it:
df = pd.read_csv(r"D:\Dokumente\example.csv", delimiter=";")
Output:
metrics test1
0 key1 1
1 key4 4
2 key2 2
3 key5 5
4 key3 3
Then if you simply sort it by your column "metrics" it should do what you want to achieve:
df = df.sort_values("metrics")
Output:
metrics test1
0 key1 1
2 key2 2
4 key3 3
1 key4 4
3 key5 5
I believe by using index_col=0 you are using that entire column as index and it cant be accessed via the key "metrics" any longer. Atleast for me that seems to be the problem here when I try to reproduce it.
Additionally if you are worried about the messed up index you can then reset it in the next step with:
df = df.reset_index(drop=True)
Which then returns this for df:
metrics test1
0 key1 1
1 key2 2
2 key3 3
3 key4 4
4 key5 5
Hope this helps.
Upvotes: 1