user13255600
user13255600

Reputation: 27

How to check if value from one column is equal to value in another columns data-frame

I have two separate data frames df and xls. Xls is a data frame that contain unique IDs that I would like to see how many times occur in my df data frame (~650,000 rows) and then create an occurrence column that would keep track of the amount of times that our unique IDs from our xls dataframe are appearing in the df dataframe.

xls = {'Unique ID': ['a', 'b', 'c', 'd', 'e'}
df = {'Contingency': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'e', 'd', 'b']} 
result_df = {'Contingency': ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'd', 'b'],'Occurences': [4, 5, 0, 1, 0]  

Ultimately, I would just like to keep a track of which Unique ID is appearing the most in DF given its unique ID.

Upvotes: 2

Views: 98

Answers (2)

theDBA
theDBA

Reputation: 239

if you want to sort by unique IDs

  results_df = df['Contingency'].value_counts().sort_index()

if you want to sort by the frequency of occurrence.

 results_df =  df['Contingency'].value_counts()

Upvotes: 1

SimonR
SimonR

Reputation: 1824

df.groupby('Contingency').count() should produce the Series you are looking for, without the need for the xls dataframe containing the unique IDs.

Edit:

If your 'df' dataframe only has the 'Contingency' column, you'll need a second column to apply the count() to, like this:

df = pd.DataFrame({'Contingency': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'e', 'd', 'b']})
df['Occurances'] = 1
result = df.groupby('Contingency').count()

Otherwise you can just do:

result = pd.DataFrame(df.Contingency.value_counts())

For the same result.

Then you can sort the values : result.sort_values(by = 'Contingency', ascending=False)

Upvotes: 5

Related Questions