Reputation: 955
So I have a file 500 columns by 600 rows and want to take the average of all columns for rows 200-400:
df = pd.read_csv('file.csv', sep= '\s+')
sliced_df=df.iloc[200:400]
Then create a new column of the averages of all rows across all columns. And extract only that newly created column:
sliced_df['mean'] = sliced_df.mean(axis=1)
final_df = sliced_df['mean']
But how can I prevent the indexes from resetting when I extract the new column?
Upvotes: 1
Views: 513
Reputation: 863361
I think is not necessary create new column in sliced_df
, only rename
name of Series
and if need output as DataFrame
add to_frame
. Indexes are not resetting, see sample bellow:
#random dataframe
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)
A B C D E
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
#in real data use df.iloc[200:400]
sliced_df=df.iloc[2:4]
print (sliced_df)
A B C D E
2 2 2 1 0 8
3 4 0 9 6 2
final_ser = sliced_df.mean(axis=1).rename('mean')
print (final_ser)
2 2.6
3 4.2
Name: mean, dtype: float64
final_df = sliced_df.mean(axis=1).rename('mean').to_frame()
print (final_df)
mean
2 2.6
3 4.2
Python counts from 0
, so maybe need change slice from 200:400
to 100:300
, see difference:
sliced_df=df.iloc[1:3]
print (sliced_df)
A B C D E
1 0 4 2 5 2
2 2 2 1 0 8
final_ser = sliced_df.mean(axis=1).rename('mean')
print (final_ser)
1 2.6
2 2.6
Name: mean, dtype: float64
final_df = sliced_df.mean(axis=1).rename('mean').to_frame()
print (final_df)
mean
1 2.6
2 2.6
Upvotes: 1
Reputation: 141
Use copy() function as follows:
df = pd.read_csv('file.csv', sep= '\s+')
sliced_df=df.iloc[200:400].copy()
sliced_df['mean'] = sliced_df.mean(axis=1)
final_df = sliced_df['mean'].copy()
Upvotes: 0