Reputation: 2005
I have two csv files. Contacts and Users.
How I load data into dataframes and merge them
First, I load a dataframe with the name of the users:
import pandas as pd
import numpy as np
df_users= pd.read_csv('./Users_001.csv',sep=',',usecols=[0,2,3])
Then I load the information from contacts of each user
df_contacts = pd.read_csv('./Contacts_001.csv',sep=',',usecols=[0,1,5,48,55,56,57,83,58])
df_users columns name are: user_id, Name, Surname
df_contacts columns name are: Contact ID, id user owner, fullname, qualification, ...
I want to merge both dataframes using user_id
and 'id user owner'
since they represent the same information. To to this I first change the name of the columns on df_contacts
and then I merge
dfcontactos.columns = ['ID de Contacto','user_id','fullname','qualification','accesibility' ... ]
df_us_cont = pd.merge(dfcontactos,df_usuarios,on='user_id')
Now df_us_cont
has the information from users and contacts.
What I want to do
There are only 18 user_id
but there are 500 contacts. For each user I want to know:
Number of contacts with qualification < 100
For the contacts that have qualification <100
How many contacts have accesibility >= 4
Accesibility is a discrete number (0-5))
What I have tried and fail
df_qua_lower100 = df_us_cont[df_us_cont['qualification']<100]
df_qua_lower100['user_id'].value_counts()
So far with this I am able to get the information on how many contacts with qualification<100
has each user_id
. But I am unable to look how many have 'accesibility>=4
'
I have tried to explain the best I could
Upvotes: 1
Views: 138
Reputation: 2039
First thing you can merge without changing column names
df_us_cont = dfcontactos.merge(dfcontactos,left_on='id user owner',right_on='user_id')
You can add as many conditions as you want if you use loc
df_us_cont.loc[(df_us_cont['qualification']<100) & (df_us_cont['accesibility']>=4),'user_id'].value_counts()
Number of contacts with qualification > 100 and < 300
df_us_cont.loc[(df_us_cont['qualification']>100) &(df_us_cont['qualification']<300) & (df_us_cont['accesibility']>=4),'user_id'].value_counts()
Number of contacts with qualification > 300
df_us_cont.loc[(df_us_cont['qualification']>300) & (df_us_cont['accesibility']>=4),'user_id'].value_counts()
Upvotes: 1