Reputation: 13686
Looking to plot a histogram emanating from a dataframe, I seem to lack in transforming to a right object type that matplotlib can deal with. Here are some failed attempts. How do I fix it up?
And more generally, how do you typically salvage something like that?
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
filter(lambda v: v > 0, df['foo_col']).hist(bins=10)
---> 10 filter(lambda v: v > 0, df['foo_col']).hist(bins=100) AttributeError: 'filter' object has no attribute 'hist'
hist(filter(lambda v: v > 0, df['foo_col']), bins=100)
---> 10 hist(filter(lambda v: v > 0, df['foo_col']), bins=100) TypeError: 'Series' object is not callable
Upvotes: 0
Views: 13954
Reputation: 13175
By all accounts, filter
is lucky to be part of the standard library. IIUC, you just want to filter your dataframe to plot a histogram of values > 0
. Pandas has its own syntax for that:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = np.random.randint(-50, 1000, 10000)
df = pd.DataFrame({'some_data': data})
df[df['some_data'] >= 0].hist(bins=100)
plt.show()
Note that this will run much faster than python builtins could ever hope to (it doesn't make much difference in my trivial example, but it will with bigger datasets). It's important to use pandas methods with dataframes wherever possible because, in many cases, the calculation will be vectorized and run in highly optimised C/C++ code.
Upvotes: 2