yanadm
yanadm

Reputation: 707

Working with columns in pandas

This is a part of the table which I have:

type  n_b
sp     2
sp     2
sp     3
avn    2
avn    4
avn    3
psp    1
psp    3
psp    5
...

Also I have a data set:

d = pd.Series({'sp':['98,00', '0,00', '68,00'], 'psp':['17,00',  '7,60', '14,30'], 
           'avn':['15,00', '10,00', '4,30']})

I need to match the value from my data set in a new column "c_t" depending on the value in the column "type". That's what should be the result:

type  n_b    c_t
sp     2    98,00
sp     2     0,00
sp     3    68,00
avn    2    15,00
avn    4    10,00
avn    3     4,30
psp    1    17,00
psp    3     7,60
psp    5    14,30
...

My code looks like this:

d = pd.Series({'sp':['98,00', '0,00', '68,00'], 'psp':['17,00',  '7,60', '14,30'], 
           'avn':['15,00', '10,00', '4,30']})

df['c_t'] = df['type'].map(d)
print (df)

But it does not work as I need it

type  n_b    c_t
sp     2    [98,00, 0,00, 68,00]
sp     2    [98,00, 0,00, 68,00]
sp     3    [98,00, 0,00, 68,00]
avn    2    [15,00, 10,00, 4,30]
avn    4    [15,00, 10,00, 4,30]
avn    3    [15,00, 10,00, 4,30]
psp    1    [17,00, 7,60, 14,30]
psp    3    [17,00, 7,60, 14,30]
psp    5    [17,00, 7,60, 14,30]
...

How can I fix this?

UPD: In fact, there is much more data in the file

d1 = pd.Series({'ds':['104,50', '19,00', '10,00', '30,00', '0,00', '0,00', '16,00', '21,50'],
           'zkp':['33,00', '100,00', '16,00', '3,30', '9,00', '0,00', '0,00', '0,00', '4,80', '78,50'],
           'dgv':['96,00', '0,00', '194,50', '61,00', '0,00', '10,00', '0,00', '28,00', '0,00', '0,00', 
                       '11,00', '30,00', '0,00', '0,00', '0,00', '16,00', '78,50'], 'sp':['98,00', '0,00', '68,00'],
           'psp':['17,00', '7,60', '14,30'],'avn':['15,00', '10,00', '4,30']})

And the table is huge:

type  n_b Day_number
ds     2     1
ds     3     2
ds     1     3
ds     2     4
ds     1     5
ds     3     6
ds     2     7
ds     1     8
sp     2     1
sp     2     2
sp     1     3
avn    2     1
avn    4     2
avn    3     3
psp    1     1
psp    3     2
psp    5     3
sp     2     1
sp     2     2
sp     4     3
...

And all types(ds, zkp, dgv, sp, psp, avn) are in the file. The column "n_b" does not affect the column "c_t". In the column "Day_number" numbered days, if it helps.

And the result should be the following:

 type  n_b Day_number      c_t
ds     2     1           104,50
ds     3     2            19,00
ds     1     3            10,00
ds     2     4            30,00
ds     1     5             0,00
ds     3     6             0,00
ds     2     7            16,00
ds     1     8            21,50
sp     2     1            98,00
sp     2     2             0,00
sp     1     3            68,00
avn    2     1            15,00  
avn    4     2            10,00
avn    3     3             4,30
psp    1     1            17,00  
psp    3     2             7,60
psp    5     3            14,30
sp     2     1            98,00
sp     2     2             0,00
sp     4     3            68,00
...

Upvotes: 2

Views: 213

Answers (2)

jezrael
jezrael

Reputation: 862481

You can create DataFrame from lists first, then create helper columns g with unique values per group with cumcount and last join to df:

#get lenghts of lists
l = d.str.len()
#repeat index values by lengths
r = np.repeat(d.index.values, l)
#flattening lists
v = np.concatenate(d.values)
#DataFrame constructor
df1 = pd.DataFrame({'type':r, 'c_t':v})
print (df1)
     c_t type
0  15,00  avn
1  10,00  avn
2   4,30  avn
3  17,00  psp
4   7,60  psp
5  14,30  psp
6  98,00   sp
7   0,00   sp
8  68,00   sp

df1['g'] = df1.groupby('type').cumcount()
df['g'] = df.groupby('type').cumcount()

df = df.join(df1.set_index(['type','g']), on=['type','g']).drop('g', axis=1)
print (df)
   ype  n_b    c_t
0   sp    2  98,00
1   sp    2   0,00
2   sp    3  68,00
3  avn    2  15,00
4  avn    4  10,00
5  avn    3   4,30
6  psp    1  17,00
7  psp    3   7,60
8  psp    5  14,30

EDIT:

d1 = pd.Series({'ds':['104,50', '19,00', '10,00', '30,00', '0,00', '0,00', '16,00', '21,50'],
           'zkp':['33,00', '100,00', '16,00', '3,30', '9,00', '0,00', '0,00', '0,00', '4,80', '78,50'],
           'dgv':['96,00', '0,00', '194,50', '61,00', '0,00', '10,00', '0,00', '28,00', '0,00', '0,00', 
                       '11,00', '30,00', '0,00', '0,00', '0,00', '16,00', '78,50'], 'sp':['98,00', '0,00', '68,00'],
           'psp':['17,00', '7,60', '14,30'],'avn':['15,00', '10,00', '4,30']})

#get lenghts of lists
l = d1.str.len()
#repeat index values by lengths
r = np.repeat(d1.index.values, l)
#flattening lists
v = np.concatenate(d1.values)
#DataFrame constructor
df1 = pd.DataFrame({'type':r, 'c_t':v})
df1['g'] = df1.groupby('type').cumcount() + 1

df = df.join(df1.set_index(['type','g']), on=['type','Day_number'])
print (df)
   type  n_b  Day_number     c_t
0    ds    2           1  104,50
1    ds    3           2   19,00
2    ds    1           3   10,00
3    ds    2           4   30,00
4    ds    1           5    0,00
5    ds    3           6    0,00
6    ds    2           7   16,00
7    ds    1           8   21,50
8    sp    2           1   98,00
9    sp    2           2    0,00
10   sp    1           3   68,00
11  avn    2           1   15,00
12  avn    4           2   10,00
13  avn    3           3    4,30
14  psp    1           1   17,00
15  psp    3           2    7,60
16  psp    5           3   14,30
17   sp    2           1   98,00
18   sp    2           2    0,00
19   sp    4           3   68,00

Upvotes: 1

Allen Qin
Allen Qin

Reputation: 19947

It seems you're almost there. You've now got:

df
Out[758]: 
  type  n_b                   c_t
0   sp    2  [98,00, 0,00, 68,00]
1   sp    2  [98,00, 0,00, 68,00]
2   sp    3  [98,00, 0,00, 68,00]
3  avn    2  [15,00, 10,00, 4,30]
4  avn    4  [15,00, 10,00, 4,30]
5  avn    3  [15,00, 10,00, 4,30]
6  psp    1  [17,00, 7,60, 14,30]
7  psp    3  [17,00, 7,60, 14,30]
8  psp    5  [17,00, 7,60, 14,30]

One more step to get you the desired output:

#use the row index%3 to select the element from the list under c_t column.
df.c_t=df.apply(lambda x: x.c_t[x.name%3],axis=1)

df
Out[761]: 
  type  n_b    c_t
0   sp    2  98,00
1   sp    2   0,00
2   sp    3  68,00
3  avn    2  15,00
4  avn    4  10,00
5  avn    3   4,30
6  psp    1  17,00
7  psp    3   7,60
8  psp    5  14,30

Upvotes: 1

Related Questions