ch36r5s
ch36r5s

Reputation: 129

python & pandas - How to calculate frequency under conditions in columns in DataFrame?

I have a series of data in a DataFrame called frames:

   NoUsager Sens IdVehiculeUtilise NoConducteur NoAdresse Fait  NoDemande Periods
0   000001   +        287Véh          000087     000079    1   42196000013 Matin 
1   000001   -        287Véh          000087     000079    1   42196000013 Matin 
2   000314   +        263Véh          000077     006470    1   42196000002 Matin 
3   002372   +        287Véh          000087     002932    1   42196000016 Matin 
4   000466   +        287Véh          000087     002932    1   42196000015 Matin 
5   000314   -        263Véh          000077     000456    1   42196000002 Matin 
6   000466   -        287Véh          000087     004900    1   42196000015 Matin 
7   002372   -        287Véh          000087     007072    1   42196000016 Matin 
8   002641   +        263Véh          000077     007225    1   42196000004 Soir 
9   002641   -        263Véh          000077     000889    1   42196000004 Soir 
10  000382   +        263Véh          000077     002095    1   42196000006 Soir 
11  002641   +        287Véh          000087     000889    1   42196000019 Soir 
12  000382   -        263Véh          000077     006168    1   42196000006 Soir 
13  002641   -        287Véh          000087     007225    1   42196000019 Soir 
14  001611   +        287Véh          000087     004236   -1   42196000021 Soir 
15  002785   +        263Véh          000077     007482    1   42196000007 Soir 
16  002372   +        287Véh          000087     007072    1   42196000022 Soir 
17  002785   -        263Véh          000077     007483    1   42196000007 Soir 
18  000466   +        287Véh          000087     004900    1   42196000023 Soir
19  000382   +        263Véh          000077     006168    1   42196000008 Soir 

For each Usager, depending on Sens and Periods, they can have more than one related address. I want to know for all the Usager, how many address do they have and the frequency of each address. I used frames.set_index(['NoUsager','NoAdresse']) to make it looks like:


EDIT

New pic

I don't want all the other columns but only a new one with the result of frequency. In which way I can do it? Can I use pivot() to do it?

Any help will be really appreciated!

Upvotes: 1

Views: 397

Answers (1)

jezrael
jezrael

Reputation: 863531

I think you need groupby by columns which will be indexes (NoUsager,Sens,Periods) in output df. Then need add column (NoAdresse) as last item in list in groupby, which is converted by unstack to columns in output. And you need aggregate by size.

df = df.groupby(['NoUsager','Sens','Periods', 'NoAdresse']).size().unstack(fill_value=0)
print (df)
NoAdresse              79    456   889   2095  2932  4236  4900  6168  6470  \
NoUsager Sens Periods                                                         
1        +    Matin       1     0     0     0     0     0     0     0     0   
         -    Matin       1     0     0     0     0     0     0     0     0   
314      +    Matin       0     0     0     0     0     0     0     0     1   
         -    Matin       0     1     0     0     0     0     0     0     0   
382      +    Soir        0     0     0     1     0     0     0     1     0   
         -    Soir        0     0     0     0     0     0     0     1     0   
466      +    Matin       0     0     0     0     1     0     0     0     0   
              Soir        0     0     0     0     0     0     1     0     0   
         -    Matin       0     0     0     0     0     0     1     0     0   
1611     +    Soir        0     0     0     0     0     1     0     0     0   
2372     +    Matin       0     0     0     0     1     0     0     0     0   
              Soir        0     0     0     0     0     0     0     0     0   
         -    Matin       0     0     0     0     0     0     0     0     0   
2641     +    Soir        0     0     1     0     0     0     0     0     0   
         -    Soir        0     0     1     0     0     0     0     0     0   
2785     +    Soir        0     0     0     0     0     0     0     0     0   
         -    Soir        0     0     0     0     0     0     0     0     0   

NoAdresse              7072  7225  7482  7483  
NoUsager Sens Periods                          
1        +    Matin       0     0     0     0  
         -    Matin       0     0     0     0  
314      +    Matin       0     0     0     0  
         -    Matin       0     0     0     0  
382      +    Soir        0     0     0     0  
         -    Soir        0     0     0     0  
466      +    Matin       0     0     0     0  
              Soir        0     0     0     0  
         -    Matin       0     0     0     0  
1611     +    Soir        0     0     0     0  
2372     +    Matin       0     0     0     0  
              Soir        1     0     0     0  
         -    Matin       1     0     0     0  
2641     +    Soir        0     1     0     0  
         -    Soir        0     1     0     0  
2785     +    Soir        0     0     1     0  
         -    Soir        0     0     0     1  

If need reset index:

df = df.groupby(['NoUsager','Sens','Periods', 'NoAdresse'])
       .size()
       .unstack(fill_value=0)
       .reset_index()
       .rename_axis(None, axis=1)

print (df)
    NoUsager Sens Periods  79  456  889  2095  2932  4236  4900  6168  6470  \
0          1    +   Matin   1    0    0     0     0     0     0     0     0   
1          1    -   Matin   1    0    0     0     0     0     0     0     0   
2        314    +   Matin   0    0    0     0     0     0     0     0     1   
3        314    -   Matin   0    1    0     0     0     0     0     0     0   
4        382    +    Soir   0    0    0     1     0     0     0     1     0   
5        382    -    Soir   0    0    0     0     0     0     0     1     0   
6        466    +   Matin   0    0    0     0     1     0     0     0     0   
7        466    +    Soir   0    0    0     0     0     0     1     0     0   
8        466    -   Matin   0    0    0     0     0     0     1     0     0   
9       1611    +    Soir   0    0    0     0     0     1     0     0     0   
10      2372    +   Matin   0    0    0     0     1     0     0     0     0   
11      2372    +    Soir   0    0    0     0     0     0     0     0     0   
12      2372    -   Matin   0    0    0     0     0     0     0     0     0   
13      2641    +    Soir   0    0    1     0     0     0     0     0     0   
14      2641    -    Soir   0    0    1     0     0     0     0     0     0   
15      2785    +    Soir   0    0    0     0     0     0     0     0     0   
16      2785    -    Soir   0    0    0     0     0     0     0     0     0   

    7072  7225  7482  7483  
0      0     0     0     0  
1      0     0     0     0  
2      0     0     0     0  
3      0     0     0     0  
4      0     0     0     0  
5      0     0     0     0  
6      0     0     0     0  
7      0     0     0     0  
8      0     0     0     0  
9      0     0     0     0  
10     0     0     0     0  
11     1     0     0     0  
12     1     0     0     0  
13     0     1     0     0  
14     0     1     0     0  
15     0     0     1     0  
16     0     0     0     1  

Another solution with crosstab:

df = pd.crosstab([df.NoUsager,df.Sens,df.Periods], df.NoAdresse)
       .reset_index()
       .rename_axis(None, axis=1)

print (df)
    NoUsager Sens Periods  79  456  889  2095  2932  4236  4900  6168  6470  \
0          1    +   Matin   1    0    0     0     0     0     0     0     0   
1          1    -   Matin   1    0    0     0     0     0     0     0     0   
2        314    +   Matin   0    0    0     0     0     0     0     0     1   
3        314    -   Matin   0    1    0     0     0     0     0     0     0   
4        382    +    Soir   0    0    0     1     0     0     0     1     0   
5        382    -    Soir   0    0    0     0     0     0     0     1     0   
6        466    +   Matin   0    0    0     0     1     0     0     0     0   
7        466    +    Soir   0    0    0     0     0     0     1     0     0   
8        466    -   Matin   0    0    0     0     0     0     1     0     0   
9       1611    +    Soir   0    0    0     0     0     1     0     0     0   
10      2372    +   Matin   0    0    0     0     1     0     0     0     0   
11      2372    +    Soir   0    0    0     0     0     0     0     0     0   
12      2372    -   Matin   0    0    0     0     0     0     0     0     0   
13      2641    +    Soir   0    0    1     0     0     0     0     0     0   
14      2641    -    Soir   0    0    1     0     0     0     0     0     0   
15      2785    +    Soir   0    0    0     0     0     0     0     0     0   
16      2785    -    Soir   0    0    0     0     0     0     0     0     0   

    7072  7225  7482  7483  
0      0     0     0     0  
1      0     0     0     0  
2      0     0     0     0  
3      0     0     0     0  
4      0     0     0     0  
5      0     0     0     0  
6      0     0     0     0  
7      0     0     0     0  
8      0     0     0     0  
9      0     0     0     0  
10     0     0     0     0  
11     1     0     0     0  
12     1     0     0     0  
13     0     1     0     0  
14     0     1     0     0  
15     0     0     1     0  
16     0     0     0     1  

EDIT by comment:

I think you need only aggregate size:

df = df.groupby(['NoUsager','NoAdresse']).size().reset_index(name='Count')
print (df)
    NoUsager  NoAdresse  Count
0          1         79      2
1        314        456      1
2        314       6470      1
3        382       2095      1
4        382       6168      2
5        466       2932      1
6        466       4900      2
7       1611       4236      1
8       2372       2932      1
9       2372       7072      2
10      2641        889      2
11      2641       7225      2
12      2785       7482      1
13      2785       7483      1

If need set indexes, you can use another solution - rename Series name and then call to_frame:

df = df.groupby(['NoUsager','NoAdresse']).size().rename('Count').to_frame()
                    Count
NoUsager NoAdresse       
1        79             2
314      456            1
         6470           1
382      2095           1
         6168           2
466      2932           1
         4900           2
1611     4236           1
2372     2932           1
         7072           2
2641     889            2
         7225           2
2785     7482           1
         7483           1

Or add set_index:

df = df.groupby(['NoUsager','NoAdresse'])
       .size()
       .reset_index(name='Count')
       .set_index(['NoUsager','NoAdresse'])
print (df)
                    Count
NoUsager NoAdresse       
1        79             2
314      456            1
         6470           1
382      2095           1
         6168           2
466      2932           1
         4900           2
1611     4236           1
2372     2932           1
         7072           2
2641     889            2
         7225           2
2785     7482           1
         7483           1

Upvotes: 1

Related Questions