Reputation: 155
Suppose I have a data frame like this:
df=
p1 v1 p2 v2 p3 v3 p4 v4 p5 v5 p6 v6
0 3 6 5 8 4 4 8 4 9 6 0 0
1 5 0 5 9 0 8 8 5 5 2 2 9
2 6 9 8 6 9 9 9 2 8 4 2 6
3 4 1 8 0 5 9 0 2 1 2 4 8
4 1 4 8 1 3 1 4 9 6 2 6 7
5 5 4 6 5 5 2 3 0 5 5 6 4
6 4 4 9 0 2 1 7 0 1 0 8 8
7 9 1 7 3 5 4 4 4 8 9 3 8
8 1 5 0 5 4 3 6 5 2 3 1 4
9 9 1 7 6 5 3 6 8 8 4 7 5
10 1 6 5 8 2 5 1 5 3 4 5 8
11 8 7 6 6 9 3 5 5 9 7 6 7
p and v are certain parameters measured for different samples (e.g. 1, 2, 3..etc.). Now I want to multiply all the columns of "p_" by a number and use diff()
on all the columns of "v_" to subtracting the subsequent rows of that column.
I want to save the results in the same DataFrame using the corresponding samples name and the first letter of mathematical operation, like Dv1, Dv2 for output of df. diff('v1'), df.diff('v2')
and etc. Similarly, for columns of p, it would be like Mp1.
Manually for each column, I can do the operation and save the results, but it's tedious(as the number of samples is very high), so I want to automate it using conditions like for loop.
Any suggestion to do mathematical operations(subtracting, multiplying or dividing) in multiple columns of the pandas' DataFrame and save the result in the same DataFrame as a new column using a combination of column name and mathematical operation name.
New DataFrame should look like this p1 Mp1 v1 DV1 p2 Mp2 v2 Dv2 p3 Mp3 v3 Dv3.......
etc.
Upvotes: 1
Views: 118
Reputation: 93171
Try this:
# Find all columns that starts with p and followed by a number
p = df.columns[df.columns.str.match('p\d')]
# Find all columns that starts with v and followed by a number
v = df.columns[df.columns.str.match('v\d')]
# Multiply the p columns by 2
mp = df[p].mul(2).add_prefix('M')
# Take a diff of the v columns
dv = df[v].diff().add_prefix('D')
# The display order of the columns
cols = [f'{j}{i}' for i in range(1,7) for j in ['p', 'Mp', 'v', 'Dv']]
# The final result
final = pd.concat([df, mp, dv], axis=1)[cols]
Upvotes: 2
Reputation: 4548
Something like:
import pandas as pd
import io
str_data = """
p1,v1,p2,v2,p3,v3,p4,v4,p5,v5,p6,v6
3,6,5,8,4,4,8,4,9,6,0,0
5,0,5,9,0,8,8,5,5,2,2,9
6,9,8,6,9,9,9,2,8,4,2,6
4,1,8,0,5,9,0,2,1,2,4,8
1,4,8,1,3,1,4,9,6,2,6,7
5,4,6,5,5,2,3,0,5,5,6,4
4,4,9,0,2,1,7,0,1,0,8,8
9,1,7,3,5,4,4,4,8,9,3,8
1,5,0,5,4,3,6,5,2,3,1,4
9,1,7,6,5,3,6,8,8,4,7,5
1,6,5,8,2,5,1,5,3,4,5,8
8,7,6,6,9,3,5,5,9,7,6,7
"""
df = pd.read_csv(io.StringIO(str_data))
#Doing this in case you have a pN, but not a vN, or vice versa to avoid errors
p_samples = [int(c[1:]) for c in df.columns if c.startswith('p')]
v_samples = [int(c[1:]) for c in df.columns if c.startswith('v')]
samples = set(p_samples).intersection(v_samples)
samples = sorted(list(samples))
data = {}
mult_num = 7 #not sure what you want to multiply by
for sample in samples:
p_col = 'p{}'.format(sample)
v_col = 'v{}'.format(sample)
Mp_col = 'Mp{}'.format(sample)
Dv_col = 'Dv{}'.format(sample)
data[p_col] = df[p_col]
data[Mp_col] = mult_num*df[p_col]
data[v_col] = df[v_col]
data[Dv_col] = df[v_col].diff()
new_df = pd.DataFrame(data)
print(new_df)
Upvotes: 1