Find repeated rows in Excel and add into dictionary?

Question

I have 1 sheet with 2 columns

      xCode  xYear   Repeated
1.    100    1900    3
2.    100    1900    3
3.    100    1934    3
4.    200    1921    1
5.    157    1945    1

How can I find repeated rows in 'A' column

Add this rows in dictionary (Key = 'A' value & Value = 'B' value)

import pandas as pd
data = pd.read_excel (r'./1.xlsx')
df = pd.DataFrame(data, columns= ['xcode', 'xyear'])
print (df)

Pierre D · Accepted Answer

To get the values in the "result image" (but please, next time, do not upload images of code/data/errors when asking a question), you can do the following.

# simplified sample setup
df = pd.DataFrame({
    'xCode': [100,108,210,100,100],
    'xYear': [1990, 1956, 1970, 1976, 1940],
})

Count duplicated xCode values, and indicate that value for each row:

df2 = df.assign(repeated=df.groupby('xCode').transform('count')).sort_values('xCode')
>>> df2
   xCode  xYear  repeated
0    100   1990         3
3    100   1976         3
4    100   1940         3
1    108   1956         1
2    210   1970         1

You could transform that result into a dict as well:

>>> df2.set_index('xCode').to_dict()
{'xYear': {100: 1940, 108: 1956, 210: 1970},
 'repeated': {100: 3, 108: 1, 210: 1}}

Find repeated rows in Excel and add into dictionary?

Answers (2)

Related Questions