CEFA RAD
CEFA RAD

Reputation: 83

How do I create new columns from existing column values?

I have a dataframe in the form:

date         user  type
01/01/2021   Bob    A
02/01/2021   Lina   test
02/01/2021   Bob    test
06/01/2020   Bob    C 
08/01/2020   Lina   A
08/01/2020   Lina   A
08/01/2020   Bob    test
09/01/2020   Lina   test

I want to mutate the said dataframe so that it contains only test type rows moving all the other type values as a new columns with the count before the current test row so that I would getting something like:

date         user    type  A   C 
02/01/2021   Lina    test  0   0 
02/01/2021   Bob     test  1   0 
08/01/2020   Bob     test  1   1 
09/01/2020   Lina    test  2   0 

I'm trying to use pd.crosstab:

pd.crosstab([df['user'],df['date']], df['type'])

But I get this :

  User   date          A   C  test
  Bob    01/01/2021    1   0   0
         02/01/2021    0   0   1
         06/01/2020    0   1   0
         08/01/2020    0   0   1
  Lina   02/01/2021    0   0   1
         08/01/2020    1   0   0
         08/01/2020    1   0   0
         09/01/2020    0   0   1

How can I sum A and B value before each 1 in the test column?

Upvotes: 0

Views: 55

Answers (1)

bobby
bobby

Reputation: 78

To achieve something along the lines of what you are looking for, you need to group by user on your crosstab and then apply the cumsum aggregation function.

ct = pd.crosstab([df.user, df.date], df.type)
ctgb = ct.groupby("user").cumsum()

      User   date          A   C  test
      Bob    01/01/2021    1   0   0
             02/01/2021    1   0   1
             06/01/2020    1   1   0
             08/01/2020    1   1   1
      Lina   02/01/2021    0   0   1
             08/01/2020    2   0   0
             09/01/2020    2   0   2

Then to find out the sum of type A and C up until a test day, apply a mask filtering out days in which test == 1 in the crosstab frame.

cumulative_test_scores = ctgb.loc[ct.test == 1, ["A", "C"]]

      User   date          A   C
      Bob    02/01/2021    1   0
             08/01/2020    1   1
      Lina   02/01/2021    0   0
             09/01/2020    2   0

It's not the exact same format you desired however I believe it makes more sense than having the redundant type column and illustrates the results are for cumulative scores are for each user.

Edit

To get your desire format:

cumulative_test_scores.reset_index(level=0, inplace=True)
cumulative_test_scores.columns.name = None
cumulative_test_scores.insert(1, 'type', 'test')

Upvotes: 1

Related Questions