Another.Chemist
Another.Chemist

Reputation: 2559

Python: Why cannot I copy a string from one df to another one?

I am a beginner, self-taught.

I have this initial matrix:

,,1,2,3,4,5,6,7,8,9,10,11,12
,,C,O,O,C,H,H,H,C,C,H,H,H
1,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,3.22738313640333,2.640130058756468,2.6401484355574363,1.4784953771865779,2.4427526711622995,3.4404701049315856,2.6506415109695562,2.173942147030341
2,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,3.702905546101439,2.6354536594179083,2.6355724561170515,2.3918864536893496,2.871975783234887,3.9479515489105172,2.5936449600745437,3.2896946757332293
3,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.015476882415432,2.0693088134923188,2.0692958839669946,2.3236193736523485,3.560975969980456,4.431347320573397,3.951843753512012,2.4366421143893597
4,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,1.0868846056358739,1.0921261760040055,1.092126228351473,3.6419246237091034,4.772348473634059,5.725281935435472,4.948741644534887,3.855293676517857
5,H,3.22738313640333,3.702905546101439,2.015476882415432,1.0868846056358739,0.0,1.7916118321336392,1.7916073980710447,4.336840746006843,5.570012200282658,6.44436962662531,5.876935928592363,4.304036910039309
6,H,2.640130058756468,2.6354536594179083,2.0693088134923188,1.0921261760040055,1.7916118321336392,0.0,1.774322615322816,3.999843247699306,5.001451201004137,5.992370839831868,5.038926795069471,4.349546588337786
7,H,2.6401484355574363,2.6355724561170515,2.0692958839669946,1.092126228351473,1.7916073980710447,1.774322615322816,0.0,3.9999029642804302,5.001556219427222,5.992449776200327,5.039085741282741,4.349558376763068
8,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,4.336840746006843,3.999843247699306,3.9999029642804302,0.0,1.324770443414403,2.107792016824585,2.085364895492881,1.079295724832157
9,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,5.570012200282658,5.001451201004137,5.001556219427222,1.324770443414403,0.0,1.0763707503087891,1.0781013610472885,2.1192372863195152
10,H,3.4404701049315856,3.9479515489105172,4.431347320573397,5.725281935435472,6.44436962662531,5.992370839831868,5.992449776200327,2.107792016824585,1.0763707503087891,0.0,1.8418880170159488,2.4949700018092598
11,H,2.6506415109695562,2.5936449600745437,3.951843753512012,4.948741644534887,5.876935928592363,5.038926795069471,5.039085741282741,2.085364895492881,1.0781013610472885,1.8418880170159488,0.0,3.067298402780731
12,H,2.173942147030341,3.2896946757332293,2.4366421143893597,3.855293676517857,4.304036910039309,4.349546588337786,4.349558376763068,1.079295724832157,2.1192372863195152,2.4949700018092598,3.067298402780731,0.0

And I would like to obtain:

,,1.0,2.0,3.0,4.0,8.0,9.0
,,C,O,O,C,C,C
1.0,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,1.4784953771865779,2.4427526711622995
2.0,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,2.3918864536893496,2.871975783234887
3.0,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.3236193736523485,3.560975969980456
4.0,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,3.6419246237091034,4.772348473634059
8.0,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,0.0,1.324770443414403
9.0,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,1.324770443414403,0.0

What I have tried is:

aa=0
bb=0
for i in range(0,a):
    if matrix.iloc[i,1] != "H":
        for j in range(0,a):
            if matrix.iloc[1,j] != "H":
#########################
                d_nH2.iloc[i-aa][j-bb]=matrix.iloc[i,j]
#########################
            else:
                bb=bb+1
        bb=0
    else:
        aa=aa+1

And I got:

     0    1         2         3         4         5         6         7
0  NaN  NaN  1.000000  2.000000  3.000000  4.000000  8.000000  9.000000
1  NaN  NaN  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
2  1.0  0.0  0.000000  1.205475  1.342932  2.343014  1.478495  2.442753
3  2.0  0.0  1.205475  0.000000  2.245468  2.644316  2.391886  2.871976
4  3.0  0.0  1.342932  2.245468  0.000000  1.418916  2.323619  3.560976
5  4.0  0.0  2.343014  2.644316  1.418916  0.000000  3.641925  4.772348
6  8.0  0.0  1.478495  2.391886  2.323619  3.641925  0.000000  1.324770
7  9.0  0.0  2.442753  2.871976  3.560976  4.772348  1.324770  0.000000

The row and column with letters are not copied. why?

I need to work on this mixed table, with numbers and letters, to do after that the following operations:

  1. Obtain a new matrix from the initial by filtering 'H'
  2. With the new table, I want to write two nested loops to do: for each letter that is != 'H' counts for each letter how many times that row that cross a column with 'H' is lesser that 1.6.
  3. What I have in mind is to treat any table as in bash: a mixture of strings and numbers and each value is easy accesable

Upvotes: 0

Views: 69

Answers (1)

mozway
mozway

Reputation: 261280

You should refrain from thinking of your data as structures in other langages (such as bash). An important point in using numpy/pandas is that you can vectorize operations. Most of the time your code will be shorter, simpler, and much faster. Every time you think you need to loop over values, think whether there is not a simpler vectorial approach.

Input data

The first two columns and rows should be treated as (multi)indexes (pd.MultiIndex), this is semantically the most logic. This way the data is only floats and it's much easier to select rows/columns by index properties. Think of your data for what it really represents: here a matrix of numbers with labels.

Below is an import from string but you will probably want to import from your csv file directly replacing the io.StringIO block by your filename: pd.read_csv('file.csv', index_col=[0,1], header=[0,1])

import io
df = pd.read_csv(io.StringIO(''',,1,2,3,4,5,6,7,8,9,10,11,12
,,C,O,O,C,H,H,H,C,C,H,H,H
1,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,3.22738313640333,2.640130058756468,2.6401484355574363,1.4784953771865779,2.4427526711622995,3.4404701049315856,2.6506415109695562,2.173942147030341
2,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,3.702905546101439,2.6354536594179083,2.6355724561170515,2.3918864536893496,2.871975783234887,3.9479515489105172,2.5936449600745437,3.2896946757332293
3,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.015476882415432,2.0693088134923188,2.0692958839669946,2.3236193736523485,3.560975969980456,4.431347320573397,3.951843753512012,2.4366421143893597
4,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,1.0868846056358739,1.0921261760040055,1.092126228351473,3.6419246237091034,4.772348473634059,5.725281935435472,4.948741644534887,3.855293676517857
5,H,3.22738313640333,3.702905546101439,2.015476882415432,1.0868846056358739,0.0,1.7916118321336392,1.7916073980710447,4.336840746006843,5.570012200282658,6.44436962662531,5.876935928592363,4.304036910039309
6,H,2.640130058756468,2.6354536594179083,2.0693088134923188,1.0921261760040055,1.7916118321336392,0.0,1.774322615322816,3.999843247699306,5.001451201004137,5.992370839831868,5.038926795069471,4.349546588337786
7,H,2.6401484355574363,2.6355724561170515,2.0692958839669946,1.092126228351473,1.7916073980710447,1.774322615322816,0.0,3.9999029642804302,5.001556219427222,5.992449776200327,5.039085741282741,4.349558376763068
8,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,4.336840746006843,3.999843247699306,3.9999029642804302,0.0,1.324770443414403,2.107792016824585,2.085364895492881,1.079295724832157
9,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,5.570012200282658,5.001451201004137,5.001556219427222,1.324770443414403,0.0,1.0763707503087891,1.0781013610472885,2.1192372863195152
10,H,3.4404701049315856,3.9479515489105172,4.431347320573397,5.725281935435472,6.44436962662531,5.992370839831868,5.992449776200327,2.107792016824585,1.0763707503087891,0.0,1.8418880170159488,2.4949700018092598
11,H,2.6506415109695562,2.5936449600745437,3.951843753512012,4.948741644534887,5.876935928592363,5.038926795069471,5.039085741282741,2.085364895492881,1.0781013610472885,1.8418880170159488,0.0,3.067298402780731
12,H,2.173942147030341,3.2896946757332293,2.4366421143893597,3.855293676517857,4.304036910039309,4.349546588337786,4.349558376763068,1.079295724832157,2.1192372863195152,2.4949700018092598,3.067298402780731,0.0'''),
                index_col=[0,1],
                header=[0,1])

Now your dataframe is of type float, you can check it with df.dtypes that will output float64 for each column

How to filterout the 'H' rows/columns

With indexes it is very easy and doesn't require manual computations. The level option indicates that you are matching the values on the second level of the indexes (second = 1 in python)

df.drop(columns='H', index='H', level=1)

output:

            1         2         3         4         8         9
            C         O         O         C         C         C
1 C  0.000000  1.205475  1.342932  2.343014  1.478495  2.442753
2 O  1.205475  0.000000  2.245468  2.644316  2.391886  2.871976
3 O  1.342932  2.245468  0.000000  1.418916  2.323619  3.560976
4 C  2.343014  2.644316  1.418916  0.000000  3.641925  4.772348
8 C  1.478495  2.391886  2.323619  3.641925  0.000000  1.324770
9 C  2.442753  2.871976  3.560976  4.772348  1.324770  0.000000

How to count how many times a non-H row crosses an H column with value <1.6

Here is the logic step by step (run only the last one):

  1. drop H rows and non-H columns: df.drop(index='H', columns=list('CO'), level=1):
            5         6         7        10        11        12
            H         H         H         H         H         H
1 C  3.227383  2.640130  2.640148  3.440470  2.650642  2.173942
2 O  3.702906  2.635454  2.635572  3.947952  2.593645  3.289695
3 O  2.015477  2.069309  2.069296  4.431347  3.951844  2.436642
4 C  1.086885  1.092126  1.092126  5.725282  4.948742  3.855294
8 C  4.336841  3.999843  3.999903  2.107792  2.085365  1.079296
9 C  5.570012  5.001451  5.001556  1.076371  1.078101  2.119237
  1. check whether < 1.6: (df.drop(index='H', columns=list('CO'), level=1)<1.6)
         5      6      7     10     11     12
         H      H      H      H      H      H
1 C  False  False  False  False  False  False
2 O  False  False  False  False  False  False
3 O  False  False  False  False  False  False
4 C   True   True   True  False  False  False
8 C  False  False  False  False  False   True
9 C  False  False  False   True   True  False
  1. count the True: (df.drop(index='H', columns=list('CO'), level=1)<1.6).sum(axis=1)
1  C    0
2  O    0
3  O    0
4  C    3
8  C    1
9  C    2

NB. as your matrix is symetrical, the transposed operation (df.drop(index=['C', 'O'], columns='H', level=1)<1.6).sum() would give the exact same output

Upvotes: 1

Related Questions