Reputation: 67
I have 1 sheet with 2 columns
xCode xYear Repeated
1. 100 1900 3
2. 100 1900 3
3. 100 1934 3
4. 200 1921 1
5. 157 1945 1
How can I find repeated rows in 'A' column
Add this rows in dictionary (Key = 'A' value & Value = 'B' value)
import pandas as pd
data = pd.read_excel (r'./1.xlsx')
df = pd.DataFrame(data, columns= ['xcode', 'xyear'])
print (df)
Upvotes: 0
Views: 86
Reputation: 26251
To get the values in the "result image" (but please, next time, do not upload images of code/data/errors when asking a question), you can do the following.
# simplified sample setup
df = pd.DataFrame({
'xCode': [100,108,210,100,100],
'xYear': [1990, 1956, 1970, 1976, 1940],
})
Count duplicated xCode
values, and indicate that value for each row:
df2 = df.assign(repeated=df.groupby('xCode').transform('count')).sort_values('xCode')
>>> df2
xCode xYear repeated
0 100 1990 3
3 100 1976 3
4 100 1940 3
1 108 1956 1
2 210 1970 1
You could transform that result into a dict
as well:
>>> df2.set_index('xCode').to_dict()
{'xYear': {100: 1940, 108: 1956, 210: 1970},
'repeated': {100: 3, 108: 1, 210: 1}}
Upvotes: 1
Reputation: 212
To get duplicates in a dataframe you can use duplicated
:
df["duplicated_flag"] = df.duplicated(subset=['xcode'])
# get the data that is duplicated only
duplicates = df[df["duplicated_flag"]]
# create a dictionary with the tuples of duplicates
result = dict(zip(duplicates["xcode"], duplicates["xyear"]))
Upvotes: 2