ScalaBoy
ScalaBoy

Reputation: 3392

How to conditionally sort X-axis values in Matplotlib plot?

I have the following DataFrame:

file     size
abc1.txt  2.1 MB
abc2.txt  1.0 MB
abc3.txt  1.5 MB
abc4.txt  767.9 KB

When I plot these data (plt.plot(df['file'],df['size'])), the values of KB and MB are obviously incorrectly ordered and are messed. How can I sort them so that the sorting would start from KB and would continue with MB?

767.9 KB  1.0 MB  1.5 MB  2.1 MB

Upvotes: 0

Views: 657

Answers (2)

mujjiga
mujjiga

Reputation: 16866

df = pd.DataFrame({'file': [1,2,3,4], 'size': ['2.1 MB', '1.0 MB', '1.5 MB', '767.9 KB']})
cv= {'': 1, 'KB': 1e1, 'MB': 1e6, 'GB': 1e9, 'TB': 1e12}
df['size_bytes'] = df['size'].apply(lambda x: float(x.split()[0])*cv[x.split()[1]] 
                                    if len(x.split())==2 else float(x))
fig, ax = plt.subplots()
plt.plot(df['file'],df['size_bytes'])

And if you want the y axis in human readable form

def to_human_readable(size):
    power = 1000
    n = 0
    mem = {0 : '', 1: 'KB', 2: 'MB', 3: 'GB', 4: 'TB'}
    while size > power:
        size /=  power
        n += 1
    return "{0} {1}".format(size, mem[n])

ax.set_yticklabels([to_human_readable(v) if v >= 0 else ' ' for v in  
                    ax.get_yticks(minor=False)])

enter image description here

(In digital storage 1kb = 1000)

Upvotes: 3

Declan
Declan

Reputation: 634

First it's reading your numbers as a string, so any order wouldn't really make much sense and further the the space between the points is not representative.

Also in general I'd say it's poor practice to have different units on the same axis. Better to convert to the same unit:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame([['abc1.txt',  '2.1 MB'],
                   ['abc2.txt',  '1.0 MB'],
                   ['abc3.txt',  '1.5 MB'],
                   ['abc4.txt',  '767.9 KB']], columns=["file", 'size'])

# This is a list comprehension that splits the number out of the string, converts it to a float, 
# and divides it by 1000 if the other part of the string is 'KB'.
df['size_float'] = [float(x[0])/1000 if x[1]=='KB' else float(x[0]) for x in df['size'].str.split()]
plt.plot(df['file'],df['size_float'])

Upvotes: 1

Related Questions