Reputation: 99
I have a large dataset of 670 columns and 2856 rows. The idea is to sum two consequent rows and retrieve a single column and value as result. It's important to not have replacement in the way the fist column + second, then the third + the fourth not the second + third.
Index | ID1 | ID2 | ID3 | ID4 |
---|---|---|---|---|
First | 0 | 1 | 0 | 1 |
Second | 0 | 0 | 1 | 1 |
the result should be
Index | ID12 | ID34 |
---|---|---|
First | 1 | 1 |
Second | 0 | 2 |
The example dataframe:
df = pd.DataFrame({"ID1" : [0,0,0,1,1,1] , "ID2" :[1,1,1,0,0,0], "ID3" : [0,1,1,1,0,1]},"ID4" : [0,0,0,0,0,0])
result = pd.DataFrame({"ID1/2" : [1,1,1,0,0,0] , "ID3/4" :[0,1,1,1,0,1]})
I have tried:
res = []
for i in range(len(df)):
for j in range(1,len(df.columns),2):
res.append(data.iloc[i,j]+data.iloc[i,j-1])
result = pd.DataFrame(res)
In R the result is:
result <- matrix(nrow = nrow(df), ncol = ncol(df),)
for (i in seq(1,ncol(df),2)){
result[,i] <- df[,i]+df[,i+1]
}
#Erasing the NAs columns
result <- result [,-seq(2,ncol(result ),2)]
Upvotes: 2
Views: 38
Reputation: 92884
With magic slicing:
res = pd.DataFrame(df.values[:, ::2] + df.values[:, 1::2],
columns=df.columns[::2] + df.columns[1::2].str[2:], index=df.index)
ID12 ID34
First 1 1
Second 0 2
Upvotes: 1
Reputation: 18315
N = len(df.columns)
new_df = df.groupby(np.arange(N) // 2, axis="columns").sum()
new_df.columns = [f"ID{j}{j+1}" for j in range(1, N, 2)]
to get
>>> new_df
ID12 ID34
Index
First 1 1
Second 0 2
Upvotes: 3