Mardibak
Mardibak

Reputation: 67

Find repeated rows in Excel and add into dictionary?

I have 1 sheet with 2 columns

      xCode  xYear   Repeated
1.    100    1900    3
2.    100    1900    3
3.    100    1934    3
4.    200    1921    1
5.    157    1945    1

How can I find repeated rows in 'A' column

Add this rows in dictionary (Key = 'A' value & Value = 'B' value)

import pandas as pd
data = pd.read_excel (r'./1.xlsx')
df = pd.DataFrame(data, columns= ['xcode', 'xyear'])
print (df)

Upvotes: 0

Views: 86

Answers (2)

Pierre D
Pierre D

Reputation: 26251

To get the values in the "result image" (but please, next time, do not upload images of code/data/errors when asking a question), you can do the following.

# simplified sample setup
df = pd.DataFrame({
    'xCode': [100,108,210,100,100],
    'xYear': [1990, 1956, 1970, 1976, 1940],
})

Count duplicated xCode values, and indicate that value for each row:

df2 = df.assign(repeated=df.groupby('xCode').transform('count')).sort_values('xCode')
>>> df2
   xCode  xYear  repeated
0    100   1990         3
3    100   1976         3
4    100   1940         3
1    108   1956         1
2    210   1970         1

You could transform that result into a dict as well:

>>> df2.set_index('xCode').to_dict()
{'xYear': {100: 1940, 108: 1956, 210: 1970},
 'repeated': {100: 3, 108: 1, 210: 1}}

Upvotes: 1

Rama Salahat
Rama Salahat

Reputation: 212

To get duplicates in a dataframe you can use duplicated :

df["duplicated_flag"] = df.duplicated(subset=['xcode'])
# get the data that is duplicated only
duplicates = df[df["duplicated_flag"]]
# create a dictionary with the tuples of duplicates
result = dict(zip(duplicates["xcode"], duplicates["xyear"]))

Upvotes: 2

Related Questions