Reputation: 1946

Formatting groupby output in Python

I have a DataFrame that looks like:

import pandas as pd

df = pd.DataFrame(columns=['date', 'type', 'version'],
                  data=[
                      ['2017-07-01', 'critical::issue::A', 'version1'],
                      ['2017-07-01', 'critical::issue::A', 'version2'],
                      ['2017-07-01', 'hardware::issue::B', 'version1'],
                  ])

I'm returning the size of all the unique values for 'type' using the following;

sub_cat = ['critical::',
           'hardware::',
           'software::'
           ]

for cat in sub_cat:
    x = df[df.type.str.startswith(cat)]

    count = x.groupby('type').size()
    if len(count) > 0:
        print(count)
    else:
        print(cat, '0')

Results are correct but the output is sloppy:

type
critical::issue::A    2
dtype: int64
type
hardware::issue::B    1
dtype: int64
  software:: 0

I'd like to format the output to make it more readable like the following example.

type
critical::issue::A    2
hardware::issue::B    1
software:: 0

Any suggestions?

Upvotes: 0

Answers (4)

Anton vBR

Reputation: 18916

An alternative solution, if you just change:

print(count)

To:

print(count.to_string(header=False))

You get:

critical::issue::A    2
hardware::issue::B    1
software:: 0

So maybe add a print("type") before the loop and you are there?

Upvotes: 1

MaxU - stand with Ukraine

Reputation: 210842

Consider this Pandas approach:

In [79]: res = df.groupby('type').size()

In [80]: res
Out[80]:
type
critical::issue::A    2
hardware::issue::B    1
dtype: int64

In [81]: s = pd.Series(sub_cat)

In [82]: idx = s[~s.isin(df.type.str.extract(r'(\w+::)', expand=False).unique())].values

In [83]: res = res.append(pd.Series([0] * len(idx), index=idx))

In [84]: res
Out[84]:
critical::issue::A    2
hardware::issue::B    1
software::            0
dtype: int64

Upvotes: 0

sbond

Reputation: 178

Here is your code with suggested changes:

import pandas as pd

df = pd.DataFrame(columns=['date', 'type', 'version'],
                  data=[
                      ['2017-07-01', 'critical::issue::A', 'version1'],
                      ['2017-07-01', 'critical::issue::A', 'version2'],
                      ['2017-07-02', 'critical::issue::B', 'version3'],
                      ['2017-07-01', 'hardware::issue::B', 'version1'],
                  ])  

sub_cat = ['critical::',
           'hardware::',
           'software::']

print("type")

for cat in sub_cat:
    x = df[df.type.str.startswith(cat)]

    count = x.groupby('type').size()

    # 'count' is a Series object
    for i in range(len(count)):
        print("{}\t{}".format(count.index[i], count[i]))

    if len(count) == 0:
        print("{}\t{}".format(cat, 0))

It produces:

type
critical::issue::A      2
critical::issue::B      1
hardware::issue::B      1
software::      0

Upvotes: 0

nanojohn

Reputation: 582

You could loop through the rows of your count groupby variable to output the lines 1 by 1:

for cat in sub_cat:
    x = df[df.type.str.startswith(cat)]
    count = x.groupby('type').size()
    if len(count) > 0:
        for ind, row in count.iteritems():
            print(ind, row)
    else:
        print(cat, '0')

Output is as follows:

critical::issue::A 2
hardware::issue::B 1
software:: 0

Upvotes: 0

Formatting groupby output in Python

Answers (4)

Related Questions