Reputation: 129
I have a series of data in a DataFrame called frames
:
NoUsager Sens IdVehiculeUtilise NoConducteur NoAdresse Fait NoDemande Periods
0 000001 + 287Véh 000087 000079 1 42196000013 Matin
1 000001 - 287Véh 000087 000079 1 42196000013 Matin
2 000314 + 263Véh 000077 006470 1 42196000002 Matin
3 002372 + 287Véh 000087 002932 1 42196000016 Matin
4 000466 + 287Véh 000087 002932 1 42196000015 Matin
5 000314 - 263Véh 000077 000456 1 42196000002 Matin
6 000466 - 287Véh 000087 004900 1 42196000015 Matin
7 002372 - 287Véh 000087 007072 1 42196000016 Matin
8 002641 + 263Véh 000077 007225 1 42196000004 Soir
9 002641 - 263Véh 000077 000889 1 42196000004 Soir
10 000382 + 263Véh 000077 002095 1 42196000006 Soir
11 002641 + 287Véh 000087 000889 1 42196000019 Soir
12 000382 - 263Véh 000077 006168 1 42196000006 Soir
13 002641 - 287Véh 000087 007225 1 42196000019 Soir
14 001611 + 287Véh 000087 004236 -1 42196000021 Soir
15 002785 + 263Véh 000077 007482 1 42196000007 Soir
16 002372 + 287Véh 000087 007072 1 42196000022 Soir
17 002785 - 263Véh 000077 007483 1 42196000007 Soir
18 000466 + 287Véh 000087 004900 1 42196000023 Soir
19 000382 + 263Véh 000077 006168 1 42196000008 Soir
For each Usager
, depending on Sens
and Periods
, they can have more than one related address. I want to know for all the Usager
, how many address
do they have and the frequency of each address.
I used frames.set_index(['NoUsager','NoAdresse'])
to make it looks like:
EDIT
I don't want all the other columns but only a new one with the result of frequency. In which way I can do it? Can I use pivot()
to do it?
Any help will be really appreciated!
Upvotes: 1
Views: 397
Reputation: 863531
I think you need groupby
by columns which will be indexes
(NoUsager
,Sens
,Periods
) in output df. Then need add column (NoAdresse
) as last item in list in groupby
, which is converted by unstack
to columns in output. And you need aggregate by size
.
df = df.groupby(['NoUsager','Sens','Periods', 'NoAdresse']).size().unstack(fill_value=0)
print (df)
NoAdresse 79 456 889 2095 2932 4236 4900 6168 6470 \
NoUsager Sens Periods
1 + Matin 1 0 0 0 0 0 0 0 0
- Matin 1 0 0 0 0 0 0 0 0
314 + Matin 0 0 0 0 0 0 0 0 1
- Matin 0 1 0 0 0 0 0 0 0
382 + Soir 0 0 0 1 0 0 0 1 0
- Soir 0 0 0 0 0 0 0 1 0
466 + Matin 0 0 0 0 1 0 0 0 0
Soir 0 0 0 0 0 0 1 0 0
- Matin 0 0 0 0 0 0 1 0 0
1611 + Soir 0 0 0 0 0 1 0 0 0
2372 + Matin 0 0 0 0 1 0 0 0 0
Soir 0 0 0 0 0 0 0 0 0
- Matin 0 0 0 0 0 0 0 0 0
2641 + Soir 0 0 1 0 0 0 0 0 0
- Soir 0 0 1 0 0 0 0 0 0
2785 + Soir 0 0 0 0 0 0 0 0 0
- Soir 0 0 0 0 0 0 0 0 0
NoAdresse 7072 7225 7482 7483
NoUsager Sens Periods
1 + Matin 0 0 0 0
- Matin 0 0 0 0
314 + Matin 0 0 0 0
- Matin 0 0 0 0
382 + Soir 0 0 0 0
- Soir 0 0 0 0
466 + Matin 0 0 0 0
Soir 0 0 0 0
- Matin 0 0 0 0
1611 + Soir 0 0 0 0
2372 + Matin 0 0 0 0
Soir 1 0 0 0
- Matin 1 0 0 0
2641 + Soir 0 1 0 0
- Soir 0 1 0 0
2785 + Soir 0 0 1 0
- Soir 0 0 0 1
If need reset index:
df = df.groupby(['NoUsager','Sens','Periods', 'NoAdresse'])
.size()
.unstack(fill_value=0)
.reset_index()
.rename_axis(None, axis=1)
print (df)
NoUsager Sens Periods 79 456 889 2095 2932 4236 4900 6168 6470 \
0 1 + Matin 1 0 0 0 0 0 0 0 0
1 1 - Matin 1 0 0 0 0 0 0 0 0
2 314 + Matin 0 0 0 0 0 0 0 0 1
3 314 - Matin 0 1 0 0 0 0 0 0 0
4 382 + Soir 0 0 0 1 0 0 0 1 0
5 382 - Soir 0 0 0 0 0 0 0 1 0
6 466 + Matin 0 0 0 0 1 0 0 0 0
7 466 + Soir 0 0 0 0 0 0 1 0 0
8 466 - Matin 0 0 0 0 0 0 1 0 0
9 1611 + Soir 0 0 0 0 0 1 0 0 0
10 2372 + Matin 0 0 0 0 1 0 0 0 0
11 2372 + Soir 0 0 0 0 0 0 0 0 0
12 2372 - Matin 0 0 0 0 0 0 0 0 0
13 2641 + Soir 0 0 1 0 0 0 0 0 0
14 2641 - Soir 0 0 1 0 0 0 0 0 0
15 2785 + Soir 0 0 0 0 0 0 0 0 0
16 2785 - Soir 0 0 0 0 0 0 0 0 0
7072 7225 7482 7483
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 1 0 0 0
12 1 0 0 0
13 0 1 0 0
14 0 1 0 0
15 0 0 1 0
16 0 0 0 1
Another solution with crosstab
:
df = pd.crosstab([df.NoUsager,df.Sens,df.Periods], df.NoAdresse)
.reset_index()
.rename_axis(None, axis=1)
print (df)
NoUsager Sens Periods 79 456 889 2095 2932 4236 4900 6168 6470 \
0 1 + Matin 1 0 0 0 0 0 0 0 0
1 1 - Matin 1 0 0 0 0 0 0 0 0
2 314 + Matin 0 0 0 0 0 0 0 0 1
3 314 - Matin 0 1 0 0 0 0 0 0 0
4 382 + Soir 0 0 0 1 0 0 0 1 0
5 382 - Soir 0 0 0 0 0 0 0 1 0
6 466 + Matin 0 0 0 0 1 0 0 0 0
7 466 + Soir 0 0 0 0 0 0 1 0 0
8 466 - Matin 0 0 0 0 0 0 1 0 0
9 1611 + Soir 0 0 0 0 0 1 0 0 0
10 2372 + Matin 0 0 0 0 1 0 0 0 0
11 2372 + Soir 0 0 0 0 0 0 0 0 0
12 2372 - Matin 0 0 0 0 0 0 0 0 0
13 2641 + Soir 0 0 1 0 0 0 0 0 0
14 2641 - Soir 0 0 1 0 0 0 0 0 0
15 2785 + Soir 0 0 0 0 0 0 0 0 0
16 2785 - Soir 0 0 0 0 0 0 0 0 0
7072 7225 7482 7483
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 0 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
11 1 0 0 0
12 1 0 0 0
13 0 1 0 0
14 0 1 0 0
15 0 0 1 0
16 0 0 0 1
EDIT by comment:
I think you need only aggregate size
:
df = df.groupby(['NoUsager','NoAdresse']).size().reset_index(name='Count')
print (df)
NoUsager NoAdresse Count
0 1 79 2
1 314 456 1
2 314 6470 1
3 382 2095 1
4 382 6168 2
5 466 2932 1
6 466 4900 2
7 1611 4236 1
8 2372 2932 1
9 2372 7072 2
10 2641 889 2
11 2641 7225 2
12 2785 7482 1
13 2785 7483 1
If need set indexes, you can use another solution - rename
Series
name and then call to_frame
:
df = df.groupby(['NoUsager','NoAdresse']).size().rename('Count').to_frame()
Count
NoUsager NoAdresse
1 79 2
314 456 1
6470 1
382 2095 1
6168 2
466 2932 1
4900 2
1611 4236 1
2372 2932 1
7072 2
2641 889 2
7225 2
2785 7482 1
7483 1
Or add set_index
:
df = df.groupby(['NoUsager','NoAdresse'])
.size()
.reset_index(name='Count')
.set_index(['NoUsager','NoAdresse'])
print (df)
Count
NoUsager NoAdresse
1 79 2
314 456 1
6470 1
382 2095 1
6168 2
466 2932 1
4900 2
1611 4236 1
2372 2932 1
7072 2
2641 889 2
7225 2
2785 7482 1
7483 1
Upvotes: 1