skad00sh
skad00sh

Reputation: 171

Calulating number of series in python pandas

I wanted to calculate the number of series present in the given data.

I need this information for the time-series count.

Here is the data

Here I would like the user to select how to check series.

e.g. Series can be Region > Product > Country (please take this selection for this code also)

Now, series are:

  1. Asia > A > India
  2. Asia > A > Thailand
  3. Asia > B > India
  4. Asia > B > Thailand
  5. Asia > D > Japan
  6. Europe > A > Italy
  7. Europe > A > Turkey
  8. Europe > B > Italy

So I need an answer as '8' since there are 8 series for selected hierarchy.

I was successfully able to do this by converting CSV to excel and then counting all series. But it is very time consuming if I have large data.

import pandas as pd
import numpy as np

df = pd.read_csv("data.csv")
state = df.unstack('Sales')
set1= list(set(state))

pivot = pd.pivot_table(df,index=["Region","Country","Product"],values="Sales",aggfunc=np.sum)
df1 = pd.DataFrame(pivot)
df1.to_excel("output.xlsx")

df2 = pd.read_excel("output.xlsx")

cols = list(df2.columns)
count_TS = 0
for i in cols:
    if i =="":
        continue
    count_TS += df2[i].count()


print("Total Timeseries = ",count_TS + 1 -(df2['Sales'].count()))

Note: Hierarchy used in the above code is Region > Country > Product

Is it possible to do this without creating new excel file?

Here is the numpy array for you:

array([['Asia', 'India', 'A', 200],
       ['Asia', 'Thailand', 'A', 150],
       ['Asia', 'India', 'B', 175],
       ['Asia', 'Thailand', 'B', 225],
       ['Asia', 'Japan', 'D', 325],
       ['Europe', 'Italy', 'A', 120],
       ['Europe', 'Turkey', 'A', 130],
       ['Europe', 'Italy', 'B', 160]], dtype=object)

Upvotes: 1

Views: 72

Answers (1)

ansev
ansev

Reputation: 30920

IIUC, You want GroupBy.ngroups:

df.groupby(['Region','Country','Sales']).ngroups
#8 Output

Upvotes: 1

Related Questions