Reputation: 3801
I have a dataframe 'grid' that looks like this:
COLUMN_NM DISTINCT_COUNT MAX_COL_VALUE MIN_COL_VALUE NULL_COUNT
COL_A 123 456 111 56
COL_B 15678 222 4 3456
COL_C 18994 456 76 43
...
The data in COLUMN_NM is dynamic as this DataFrame gets loaded with different tables for analysis. What I want to do is graph the current data that resides in the DataFrame. I want a bar graph for DISTINCT_COUNT another for MAX_COL_VALUE etc...all per column. So the COLUMN_NM would be represented along the x-axis
What I have so far is incorrect clearly, but you get some idea of what I am trying to do.
distinct = grid[('COLUMN_NM', 'DISTINCT_COUNT')].plot(kind=bar)
max_col = grid[('COLUMN_NM', 'MAX_COL_VALUE')].plot(kind=bar)
min_col = grid[('COLUMN_NM', 'MIN_COL_VALUE')].plot(kind=bar)
null_cnt = grid[('COLUMN_NM', 'NULL_COUNT')].plot(kind=bar)
I have all the necessary import statements. I want the output to be 4 graphs, and I can specify more bar chart parameters after I get this working. Also, would it be easier to wrap this in a for loop, or function?
Upvotes: 0
Views: 408
Reputation: 51335
Yes, I'd recommend doing this in a loop:
for col in ['DISTINCT_COUNT', 'MAX_COL_VALUE', 'MIN_COL_VALUE', 'NULL_COUNT']:
grid[['COLUMN_NM', col]].set_index('COLUMN_NM').plot.bar(title=col)
The issues with your code were:
grid[('COLUMN_NM', 'DISTINCT_COUNT')]
won't work because you are using a tuple, instead of [(...)]
you want [[...]]
to select a subset of columnsCOLUMN_NM
) as the indexUpvotes: 2