eieiboom
eieiboom

Reputation: 15

Create a column from Dataframes with different sizes conditionally in Python

I have Dataframes like followings:

df = pd.DataFrame({'buck' : ['3', '3', '3', '2', '2', '1', '1', '1', '0', '0'], 
                   'rank' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
df2 = pd.DataFrame({'col_no' : ['mr.A', 'mr.B', 'mr.C', 'mr.D', 'mr.E', 'mr.F', 'mr.G'],
                    'grade' : ['H', 'H', 'M', 'M', 'L', 'L', 'L']})

I want to create a column in df using 'col_no' from df2, based on conditions from both df and df2. The conditions are as follows:

  1. df['buck'] == 3 and df2['grade'] == 'H'
  2. df['buck'] == 2 and df2['grade'] == 'M'
  3. (df['buck'] == 1 or df['buck'] == 0) and df2['grade'] == 'L'

If the condition is met ->> loop through df2['col_no'] and put corresponding values repeatedly.

For example, the only values for the condition no.1 are mr.A and mr.B.

The desired output would be:

buck     rank     col_no        
3        1        mr.A        
3        2        mr.B       
3        3        mr.A
2        4        mr.C
2        5        mr.D
1        6        mr.E        
1        7        mr.F     
1        8        mr.G        
0        9        mr.E
0        10       mr.F

I am new to Python and do not know what key words to solve this problem are. The only option I could think of is to write a hard code looping through each row and not using Panda coding style.

Hence, any suggestion would be really appreciated. Thanks in advance !!

Upvotes: 0

Views: 55

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 148890

I would use a mapping and a transformation function

grades = {'3': 'H', '2': 'M', '1': 'L', '0': 'L'}

def trans(x):
    ln = len(x)
    grade = grades[x.name]         # find the corresponding grade
    data = df2[df2['grade'] == grade, 'col_no'].to_list() # extract matching data from df2
    data *= (ln // len(data)) + 1  # repeat as required
    return data[:ln]               # and return the expected len

It is then as simple as:

df.assign(col_no=df.groupby('buck')['buck'].transform(trans))

which gives as expected:

  buck  rank col_no
0    3     1   mr.A
1    3     2   mr.B
2    3     3   mr.A
3    2     4   mr.C
4    2     5   mr.D
5    1     6   mr.E
6    1     7   mr.F
7    1     8   mr.G
8    0     9   mr.E
9    0    10   mr.F

As above code requests one single column from the GroupBy object, the function called from transform receives each time a single Series where the name is the group identifier (here from '0' to '3') and the values the corresponding values from the column.

Upvotes: 1

Related Questions