akshahi17
akshahi17

Reputation: 155

How to do mathematical operations{eg. diff()} in multiple columns of the DataFrame and save the result in the same DataFrame as a new column

Suppose I have a data frame like this:

df=
    p1  v1  p2  v2  p3  v3  p4  v4  p5  v5  p6  v6
0   3   6   5   8   4   4   8   4   9   6   0   0
1   5   0   5   9   0   8   8   5   5   2   2   9
2   6   9   8   6   9   9   9   2   8   4   2   6
3   4   1   8   0   5   9   0   2   1   2   4   8
4   1   4   8   1   3   1   4   9   6   2   6   7
5   5   4   6   5   5   2   3   0   5   5   6   4
6   4   4   9   0   2   1   7   0   1   0   8   8
7   9   1   7   3   5   4   4   4   8   9   3   8
8   1   5   0   5   4   3   6   5   2   3   1   4
9   9   1   7   6   5   3   6   8   8   4   7   5
10  1   6   5   8   2   5   1   5   3   4   5   8
11  8   7   6   6   9   3   5   5   9   7   6   7

p and v are certain parameters measured for different samples (e.g. 1, 2, 3..etc.). Now I want to multiply all the columns of "p_" by a number and use diff() on all the columns of "v_" to subtracting the subsequent rows of that column.

I want to save the results in the same DataFrame using the corresponding samples name and the first letter of mathematical operation, like Dv1, Dv2 for output of df. diff('v1'), df.diff('v2') and etc. Similarly, for columns of p, it would be like Mp1.

Manually for each column, I can do the operation and save the results, but it's tedious(as the number of samples is very high), so I want to automate it using conditions like for loop.

Any suggestion to do mathematical operations(subtracting, multiplying or dividing) in multiple columns of the pandas' DataFrame and save the result in the same DataFrame as a new column using a combination of column name and mathematical operation name.

New DataFrame should look like this p1 Mp1 v1 DV1 p2 Mp2 v2 Dv2 p3 Mp3 v3 Dv3....... etc.

Upvotes: 1

Views: 118

Answers (2)

Code Different
Code Different

Reputation: 93171

Try this:

# Find all columns that starts with p and followed by a number
p = df.columns[df.columns.str.match('p\d')]

# Find all columns that starts with v and followed by a number
v = df.columns[df.columns.str.match('v\d')]

# Multiply the p columns by 2
mp = df[p].mul(2).add_prefix('M')

# Take a diff of the v columns
dv = df[v].diff().add_prefix('D')

# The display order of the columns
cols = [f'{j}{i}' for i in range(1,7) for j in ['p', 'Mp', 'v', 'Dv']]

# The final result
final = pd.concat([df, mp, dv], axis=1)[cols]

Upvotes: 2

mitoRibo
mitoRibo

Reputation: 4548

Something like:

import pandas as pd
import io

str_data = """
p1,v1,p2,v2,p3,v3,p4,v4,p5,v5,p6,v6
3,6,5,8,4,4,8,4,9,6,0,0
5,0,5,9,0,8,8,5,5,2,2,9
6,9,8,6,9,9,9,2,8,4,2,6
4,1,8,0,5,9,0,2,1,2,4,8
1,4,8,1,3,1,4,9,6,2,6,7
5,4,6,5,5,2,3,0,5,5,6,4
4,4,9,0,2,1,7,0,1,0,8,8
9,1,7,3,5,4,4,4,8,9,3,8
1,5,0,5,4,3,6,5,2,3,1,4
9,1,7,6,5,3,6,8,8,4,7,5
1,6,5,8,2,5,1,5,3,4,5,8
8,7,6,6,9,3,5,5,9,7,6,7
"""

df = pd.read_csv(io.StringIO(str_data))


#Doing this in case you have a pN, but not a vN, or vice versa to avoid errors
p_samples = [int(c[1:]) for c in df.columns if c.startswith('p')]
v_samples = [int(c[1:]) for c in df.columns if c.startswith('v')]
samples = set(p_samples).intersection(v_samples)
samples = sorted(list(samples))

data = {}
mult_num = 7 #not sure what you want to multiply by

for sample in samples:
    p_col = 'p{}'.format(sample)
    v_col = 'v{}'.format(sample)
    
    Mp_col = 'Mp{}'.format(sample)
    Dv_col = 'Dv{}'.format(sample)
    
    data[p_col] = df[p_col]
    data[Mp_col] = mult_num*df[p_col]
    data[v_col] = df[v_col]
    data[Dv_col] = df[v_col].diff()
    
    
new_df = pd.DataFrame(data)
print(new_df)

Upvotes: 1

Related Questions