Aaron Wu
Aaron Wu

Reputation: 25

Using Panda Data Frame to process csv file

Could someone tell me how to convert this table to an other state?

row data:

enter image description here

Just like this:

                                frequency
event  day       enrollment_id           
access 01         14333                  2
                  14343                 14
                  14391                  1
                  14469                  1
                  35220                  9
                  35307                  2
                  35442                  2
                  61263                  1
                  140486                 1
       02         14320                  9
                  14321                  4
                  14322                 30
                  14324                  3
                  14325                  2
wiki   01         35517                  1
                  76933                  1
                  111900                 3
       02         47872                  1
                  47875                  1
                  148426                 1
       03         60416                  1
                  96487                  2
                  96502                  2
                  96505                  1
                  96513                  2
       2014-07-24 200888                 1

Upvotes: 1

Views: 44

Answers (2)

jezrael
jezrael

Reputation: 862471

I think if need change second level of MultiIndex to days:

a = df.index.get_level_values(0)
b = df.index.get_level_values(1).day
c = df.index.get_level_values(2)
df.index = pd.MultiIndex.from_arrays([a,b,c], names=['event', 'day', 'enrollment_id'])

But if need count second level for b use:

b = df.groupby(level=1).cumcount().add(1).astype(str).str.zfill(2)

If need create MultiIndex DataFrame:

df = pd.read_csv(file).set_index(['event', 'day', 'enrollment_id'])

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

You can use this function.

Demo:

# first copy the file contents into clipboard
In [184]: df = read_clipboard_mi(index_names_row=1)

In [185]: df
Out[185]:
                                 frequency
event  day        enrollment_id
access 01         14333                  2
                  14343                 14
                  14391                  1
                  14469                  1
                  35220                  9
                  35307                  2
                  35442                  2
...                                    ...
wiki   02         148426                 1
       03         60416                  1
                  96487                  2
                  96502                  2
                  96505                  1
                  96513                  2
       2014-07-24 200888                 1

[26 rows x 1 columns]

In [186]: df.index
Out[186]:
MultiIndex(levels=[['access', 'wiki'], ['01', '02', '03', '2014-07-24'], [14320, 14321, 14322, 14324, 14325, 14333, 14343, 14391, 14469, 352
20, 35307, 35442, 35517, 47872, 47875, 60416, 61263, 76933, 96487, 96502, 96505, 96513, 111900, 140486, 148426, 200888]],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1
, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3], [5, 6, 7, 8, 9, 10, 11, 16, 23, 0, 1, 2, 3, 4, 12, 17, 22, 13, 14, 24, 15, 18, 19, 20, 21, 25]],
           names=['event', 'day', 'enrollment_id'])

PS you can easily adopt this function to read from a file instead of the clipboard...

Upvotes: 2

Related Questions