Reputation: 2559
I am a beginner, self-taught.
I have this initial matrix:
,,1,2,3,4,5,6,7,8,9,10,11,12
,,C,O,O,C,H,H,H,C,C,H,H,H
1,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,3.22738313640333,2.640130058756468,2.6401484355574363,1.4784953771865779,2.4427526711622995,3.4404701049315856,2.6506415109695562,2.173942147030341
2,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,3.702905546101439,2.6354536594179083,2.6355724561170515,2.3918864536893496,2.871975783234887,3.9479515489105172,2.5936449600745437,3.2896946757332293
3,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.015476882415432,2.0693088134923188,2.0692958839669946,2.3236193736523485,3.560975969980456,4.431347320573397,3.951843753512012,2.4366421143893597
4,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,1.0868846056358739,1.0921261760040055,1.092126228351473,3.6419246237091034,4.772348473634059,5.725281935435472,4.948741644534887,3.855293676517857
5,H,3.22738313640333,3.702905546101439,2.015476882415432,1.0868846056358739,0.0,1.7916118321336392,1.7916073980710447,4.336840746006843,5.570012200282658,6.44436962662531,5.876935928592363,4.304036910039309
6,H,2.640130058756468,2.6354536594179083,2.0693088134923188,1.0921261760040055,1.7916118321336392,0.0,1.774322615322816,3.999843247699306,5.001451201004137,5.992370839831868,5.038926795069471,4.349546588337786
7,H,2.6401484355574363,2.6355724561170515,2.0692958839669946,1.092126228351473,1.7916073980710447,1.774322615322816,0.0,3.9999029642804302,5.001556219427222,5.992449776200327,5.039085741282741,4.349558376763068
8,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,4.336840746006843,3.999843247699306,3.9999029642804302,0.0,1.324770443414403,2.107792016824585,2.085364895492881,1.079295724832157
9,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,5.570012200282658,5.001451201004137,5.001556219427222,1.324770443414403,0.0,1.0763707503087891,1.0781013610472885,2.1192372863195152
10,H,3.4404701049315856,3.9479515489105172,4.431347320573397,5.725281935435472,6.44436962662531,5.992370839831868,5.992449776200327,2.107792016824585,1.0763707503087891,0.0,1.8418880170159488,2.4949700018092598
11,H,2.6506415109695562,2.5936449600745437,3.951843753512012,4.948741644534887,5.876935928592363,5.038926795069471,5.039085741282741,2.085364895492881,1.0781013610472885,1.8418880170159488,0.0,3.067298402780731
12,H,2.173942147030341,3.2896946757332293,2.4366421143893597,3.855293676517857,4.304036910039309,4.349546588337786,4.349558376763068,1.079295724832157,2.1192372863195152,2.4949700018092598,3.067298402780731,0.0
And I would like to obtain:
,,1.0,2.0,3.0,4.0,8.0,9.0
,,C,O,O,C,C,C
1.0,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,1.4784953771865779,2.4427526711622995
2.0,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,2.3918864536893496,2.871975783234887
3.0,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.3236193736523485,3.560975969980456
4.0,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,3.6419246237091034,4.772348473634059
8.0,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,0.0,1.324770443414403
9.0,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,1.324770443414403,0.0
What I have tried is:
aa=0
bb=0
for i in range(0,a):
if matrix.iloc[i,1] != "H":
for j in range(0,a):
if matrix.iloc[1,j] != "H":
#########################
d_nH2.iloc[i-aa][j-bb]=matrix.iloc[i,j]
#########################
else:
bb=bb+1
bb=0
else:
aa=aa+1
And I got:
0 1 2 3 4 5 6 7
0 NaN NaN 1.000000 2.000000 3.000000 4.000000 8.000000 9.000000
1 NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2 1.0 0.0 0.000000 1.205475 1.342932 2.343014 1.478495 2.442753
3 2.0 0.0 1.205475 0.000000 2.245468 2.644316 2.391886 2.871976
4 3.0 0.0 1.342932 2.245468 0.000000 1.418916 2.323619 3.560976
5 4.0 0.0 2.343014 2.644316 1.418916 0.000000 3.641925 4.772348
6 8.0 0.0 1.478495 2.391886 2.323619 3.641925 0.000000 1.324770
7 9.0 0.0 2.442753 2.871976 3.560976 4.772348 1.324770 0.000000
The row and column with letters are not copied. why?
I need to work on this mixed table, with numbers and letters, to do after that the following operations:
Upvotes: 0
Views: 69
Reputation: 261280
You should refrain from thinking of your data as structures in other langages (such as bash). An important point in using numpy/pandas is that you can vectorize operations. Most of the time your code will be shorter, simpler, and much faster. Every time you think you need to loop over values, think whether there is not a simpler vectorial approach.
The first two columns and rows should be treated as (multi)indexes (pd.MultiIndex), this is semantically the most logic. This way the data is only floats and it's much easier to select rows/columns by index properties. Think of your data for what it really represents: here a matrix of numbers with labels.
Below is an import from string but you will probably want to import from your csv file directly replacing the io.StringIO
block by your filename: pd.read_csv('file.csv', index_col=[0,1], header=[0,1])
import io
df = pd.read_csv(io.StringIO(''',,1,2,3,4,5,6,7,8,9,10,11,12
,,C,O,O,C,H,H,H,C,C,H,H,H
1,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,3.22738313640333,2.640130058756468,2.6401484355574363,1.4784953771865779,2.4427526711622995,3.4404701049315856,2.6506415109695562,2.173942147030341
2,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,3.702905546101439,2.6354536594179083,2.6355724561170515,2.3918864536893496,2.871975783234887,3.9479515489105172,2.5936449600745437,3.2896946757332293
3,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.015476882415432,2.0693088134923188,2.0692958839669946,2.3236193736523485,3.560975969980456,4.431347320573397,3.951843753512012,2.4366421143893597
4,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,1.0868846056358739,1.0921261760040055,1.092126228351473,3.6419246237091034,4.772348473634059,5.725281935435472,4.948741644534887,3.855293676517857
5,H,3.22738313640333,3.702905546101439,2.015476882415432,1.0868846056358739,0.0,1.7916118321336392,1.7916073980710447,4.336840746006843,5.570012200282658,6.44436962662531,5.876935928592363,4.304036910039309
6,H,2.640130058756468,2.6354536594179083,2.0693088134923188,1.0921261760040055,1.7916118321336392,0.0,1.774322615322816,3.999843247699306,5.001451201004137,5.992370839831868,5.038926795069471,4.349546588337786
7,H,2.6401484355574363,2.6355724561170515,2.0692958839669946,1.092126228351473,1.7916073980710447,1.774322615322816,0.0,3.9999029642804302,5.001556219427222,5.992449776200327,5.039085741282741,4.349558376763068
8,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,4.336840746006843,3.999843247699306,3.9999029642804302,0.0,1.324770443414403,2.107792016824585,2.085364895492881,1.079295724832157
9,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,5.570012200282658,5.001451201004137,5.001556219427222,1.324770443414403,0.0,1.0763707503087891,1.0781013610472885,2.1192372863195152
10,H,3.4404701049315856,3.9479515489105172,4.431347320573397,5.725281935435472,6.44436962662531,5.992370839831868,5.992449776200327,2.107792016824585,1.0763707503087891,0.0,1.8418880170159488,2.4949700018092598
11,H,2.6506415109695562,2.5936449600745437,3.951843753512012,4.948741644534887,5.876935928592363,5.038926795069471,5.039085741282741,2.085364895492881,1.0781013610472885,1.8418880170159488,0.0,3.067298402780731
12,H,2.173942147030341,3.2896946757332293,2.4366421143893597,3.855293676517857,4.304036910039309,4.349546588337786,4.349558376763068,1.079295724832157,2.1192372863195152,2.4949700018092598,3.067298402780731,0.0'''),
index_col=[0,1],
header=[0,1])
Now your dataframe is of type float
, you can check it with df.dtypes
that will output float64
for each column
With indexes it is very easy and doesn't require manual computations. The level
option indicates that you are matching the values on the second level of the indexes (second = 1 in python)
df.drop(columns='H', index='H', level=1)
output:
1 2 3 4 8 9
C O O C C C
1 C 0.000000 1.205475 1.342932 2.343014 1.478495 2.442753
2 O 1.205475 0.000000 2.245468 2.644316 2.391886 2.871976
3 O 1.342932 2.245468 0.000000 1.418916 2.323619 3.560976
4 C 2.343014 2.644316 1.418916 0.000000 3.641925 4.772348
8 C 1.478495 2.391886 2.323619 3.641925 0.000000 1.324770
9 C 2.442753 2.871976 3.560976 4.772348 1.324770 0.000000
Here is the logic step by step (run only the last one):
df.drop(index='H', columns=list('CO'), level=1)
: 5 6 7 10 11 12
H H H H H H
1 C 3.227383 2.640130 2.640148 3.440470 2.650642 2.173942
2 O 3.702906 2.635454 2.635572 3.947952 2.593645 3.289695
3 O 2.015477 2.069309 2.069296 4.431347 3.951844 2.436642
4 C 1.086885 1.092126 1.092126 5.725282 4.948742 3.855294
8 C 4.336841 3.999843 3.999903 2.107792 2.085365 1.079296
9 C 5.570012 5.001451 5.001556 1.076371 1.078101 2.119237
(df.drop(index='H', columns=list('CO'), level=1)<1.6)
5 6 7 10 11 12
H H H H H H
1 C False False False False False False
2 O False False False False False False
3 O False False False False False False
4 C True True True False False False
8 C False False False False False True
9 C False False False True True False
(df.drop(index='H', columns=list('CO'), level=1)<1.6).sum(axis=1)
1 C 0
2 O 0
3 O 0
4 C 3
8 C 1
9 C 2
NB. as your matrix is symetrical, the transposed operation (df.drop(index=['C', 'O'], columns='H', level=1)<1.6).sum()
would give the exact same output
Upvotes: 1