Reputation: 27
I have two separate data frames df and xls. Xls is a data frame that contain unique IDs that I would like to see how many times occur in my df data frame (~650,000 rows) and then create an occurrence column that would keep track of the amount of times that our unique IDs from our xls dataframe are appearing in the df dataframe.
xls = {'Unique ID': ['a', 'b', 'c', 'd', 'e'}
df = {'Contingency': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'e', 'd', 'b']}
result_df = {'Contingency': ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'd', 'b'],'Occurences': [4, 5, 0, 1, 0]
Ultimately, I would just like to keep a track of which Unique ID is appearing the most in DF given its unique ID.
Upvotes: 2
Views: 98
Reputation: 239
results_df = df['Contingency'].value_counts().sort_index()
results_df = df['Contingency'].value_counts()
Upvotes: 1
Reputation: 1824
df.groupby('Contingency').count()
should produce the Series you are looking for, without the need for the xls dataframe containing the unique IDs.
Edit:
If your 'df' dataframe only has the 'Contingency' column, you'll need a second column to apply the count() to, like this:
df = pd.DataFrame({'Contingency': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'e', 'd', 'b']})
df['Occurances'] = 1
result = df.groupby('Contingency').count()
Otherwise you can just do:
result = pd.DataFrame(df.Contingency.value_counts())
For the same result.
Then you can sort the values : result.sort_values(by = 'Contingency', ascending=False)
Upvotes: 5