Reputation: 3392
I have the following DataFrame:
file size
abc1.txt 2.1 MB
abc2.txt 1.0 MB
abc3.txt 1.5 MB
abc4.txt 767.9 KB
When I plot these data (plt.plot(df['file'],df['size'])
), the values of KB and MB are obviously incorrectly ordered and are messed. How can I sort them so that the sorting would start from KB and would continue with MB?
767.9 KB 1.0 MB 1.5 MB 2.1 MB
Upvotes: 0
Views: 657
Reputation: 16866
df = pd.DataFrame({'file': [1,2,3,4], 'size': ['2.1 MB', '1.0 MB', '1.5 MB', '767.9 KB']})
cv= {'': 1, 'KB': 1e1, 'MB': 1e6, 'GB': 1e9, 'TB': 1e12}
df['size_bytes'] = df['size'].apply(lambda x: float(x.split()[0])*cv[x.split()[1]]
if len(x.split())==2 else float(x))
fig, ax = plt.subplots()
plt.plot(df['file'],df['size_bytes'])
And if you want the y axis in human readable form
def to_human_readable(size):
power = 1000
n = 0
mem = {0 : '', 1: 'KB', 2: 'MB', 3: 'GB', 4: 'TB'}
while size > power:
size /= power
n += 1
return "{0} {1}".format(size, mem[n])
ax.set_yticklabels([to_human_readable(v) if v >= 0 else ' ' for v in
ax.get_yticks(minor=False)])
(In digital storage 1kb = 1000)
Upvotes: 3
Reputation: 634
First it's reading your numbers as a string, so any order wouldn't really make much sense and further the the space between the points is not representative.
Also in general I'd say it's poor practice to have different units on the same axis. Better to convert to the same unit:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame([['abc1.txt', '2.1 MB'],
['abc2.txt', '1.0 MB'],
['abc3.txt', '1.5 MB'],
['abc4.txt', '767.9 KB']], columns=["file", 'size'])
# This is a list comprehension that splits the number out of the string, converts it to a float,
# and divides it by 1000 if the other part of the string is 'KB'.
df['size_float'] = [float(x[0])/1000 if x[1]=='KB' else float(x[0]) for x in df['size'].str.split()]
plt.plot(df['file'],df['size_float'])
Upvotes: 1