Update a value in one column by a condition in another in csv using pandas

Question

I have a csv file where there are columns Position and Quantity(Menge). I have to add copy of rows where(in my case it is Menge) Quantity >1 and I did it using :

file_to_check = file_to_check.loc[file_to_check.index.repeat(file_to_check.Menge)].reset_index(drop=True)

this works great, copied exactly as I wanted but I additionally have to update the Positions for them. For example :

Position  Menge  Product
200        3       a
200        3       a
200        3       a
400        7       b
400        7       b
400        7       b
400        7       b
400        7       b
400        7       b
400        7       b
200        4       c
200        4       c
200        4       c
200        4       c

I want it to look like this

Position  Menge   Product
200.1        3       a
200.2        3       a 
200.3        3       a
400.1        7       b
400.2        7       b
400.3        7       b
400.4        7       b
400.5        7       b
400.6        7       b
400.7        7       b
200.1        4       c
200.2        4       c
200.3        4       c
200.4        4       c
.
.
.

Afterwards I know I can change the Menge(Quantity) by using :

selected.loc[:, 'Menge'] = 1

I tried using for loop and loc as:

counter=0
if selected.loc[selected.Menge >1]:
    counter=selected['Menge']
    i=1
    while counter>=1:
        selected['Pos.']+=i/10
        i+=1
        counter-=1

But I keep getting the error :

'ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'

I searched for the answer but nothing is really helping me. Some help is needed since I am pretty new to python and pandas.

So I edited my question. I have different Products but with some of them have same Pos. . How can I change for each product the Pos. and not add as if all of them are one product.as shown in the table

EdChum · Accepted Answer

You can do it like this:

In[75]:
df['Position'] = df['Position'] + df.groupby('Position')['Position'].rank(method='first')/10
df

Out[75]: 
   Position  Menge
0     200.1      3
1     200.2      3
2     200.3      3
3     400.1      7
4     400.2      7
5     400.3      7
6     400.4      7
7     400.5      7
8     400.6      7
9     400.7      7

So here I groupby on 'Position' and call rank with param method='first' so that equal values are ranked in order of appearance, this effectively ranks the values in order which is the same as a counter

Your error comes from this:

counter=selected['Menge']

and then doing a comparison using:

while counter>=1:

So the error is telling you that the it doesn't understand how to interpret a Series as doing counter >= returns a boolean Series, it wants a scalar boolean value to interpret. You'd have to iterate row-wise so you get a scalar value in order to interpret correctly, besides, you should look to avoid loops where possible as it's slow

EDIT

Based on your new data, you just groupby on multiple columns:

In[81]:
df['Position'] = df['Position'] + df.groupby(['Position','Menge'])['Position'].rank(method='first')/10
df

Out[81]: 
    Position  Menge Product
0      200.1      3       a
1      200.2      3       a
2      200.3      3       a
3      400.1      7       b
4      400.2      7       b
5      400.3      7       b
6      400.4      7       b
7      400.5      7       b
8      400.6      7       b
9      400.7      7       b
10     200.1      4       c
11     200.2      4       c
12     200.3      4       c

EDIT

OK, to handle the situation where you have more than 10 occurrences so that it avoids doing 201 for instance:

In[98]:
df['Position'] = (df['Position'].astype(str) + '.' + df.groupby(['Position','Menge'])['Position'].rank(method='first').astype(int).astype(str)).astype(float)
df

Out[98]: 
    Position  Menge Product
0      200.1      3       a
1      200.2      3       a
2      200.3      3       a
3      400.1      7       b
4      400.2      7       b
5      400.3      7       b
6      400.4      7       b
7      400.5      7       b
8      400.6      7       b
9      400.7      7       b
10     200.1      4       c
11     200.2      4       c
12     200.3      4       c
13     200.4      4       c

So this converts the output from rank first to an int as it produces a float, then casts to a str so we can just create a string e.g. '200' + '.' + '1' and then cast back to a float if necessary

Update a value in one column by a condition in another in csv using pandas

Answers (2)

Related Questions