Balaji Krishnamoorthy
Balaji Krishnamoorthy

Reputation: 11

Python pandas unique count on the columns which is populated dynamically

I am new to python & pandas , i have an issue when i populate dynamically df2 (dataframes) it is not giving proper results on the unique count. I have product file where Product_ID , VendorID is the key (composite key)

df=pd.read_csv("Product_file.csv")
df2 = df['Product_ID'] + df['VendorID']
print(df2)
print('type....' , type(df2))

uniq = df2.unique().size

print('Uniue values ' , str(uniq)) >>> getting the result as 9 , it is expected .

however when i populate dynamically by reading from the table (sqlite3) which has PK as one of the column which is getting in to index 4

Index =4 will look like below

PK-KEY(Table-column) 
Product_ID, VendorID
Store_no, Loc_no
Dist_no


result_list = row[4].strip().split(',')
     #print('length of list  :', len(result_list))
     values = 0
     unique_key = ' '
     overall_key = ' '
     values = range(len(result_list))
     index = 0
     i = 1
     

     for index in values:
          #print('index  value >>>  ', i)
          if i == len(result_list):
               df_concat = df_concat + 'df['+ "'" + result_list[index] + "'" + ']'
               break
          elif (i < len(result_list)):
               df_concat = df_concat + ' df['+ "'" + result_list[index] + "'" + ']' + ' +'
          i = i + 1


print(df_concat)
#print(type(df_concat))
df_strip = df_concat.strip(' ')
print('df_strip ->>>>>'  , df_strip)
df_Series = pd.Series(df_strip)

unique_key =  df_Series.unique().size
overall_key = df_Series.count()

Getting value as 1 , not sure how to resolve it .Your help is very much appreciated.

Upvotes: 0

Views: 46

Answers (1)

Balaji Krishnamoorthy
Balaji Krishnamoorthy

Reputation: 11

I got the answer by searching thru stack over flow , PK keys stored in the result_list (list) type and i have used the .apply method to get the counts.

df['Combined'] = df[result_list].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)
         print('combined values =======  ', df['Combined'])
         overall_count = len(df['Combined'])
         unique_count =  len(df['Combined'].unique())

Upvotes: 1

Related Questions