Reputation: 11

How to assign column heading to corresponding cell values?

I'm just learning Python and for my first project I am trying to re-format a excel table that I can use on GIS. The Table have many columns with x for each corresponding records. I need to assign (replace the x) with the column names and concatenate all rows separated by commas. I was told that Pandas is a very good library to accomplish this. I did started (see sample code) but I am not sure what to do next. Any help or suggestions will be greatly appreciated. Here is a visual representation of what I am trying to accomplish:

Sample Code:

import pandas as pd

input_excel = r"C:\Projects\... Habitat_table.xlsx" # excel sheet path
excel = pd.read_excel(input_excel, sheet_name = 'Species_habitat') # sheet name
final_dataframe = pd.DataFrame (excel, columns=[‘Habitat_A, ‘Habitat B,C,&D’, ‘Habitat_E']) # every single column name
                               
habitats = [‘Habitat_A, ‘Habitat B,C,&D’, ‘Habitat_E']
                                                                  

for index, row in final_dataframe.iterrows():
    final_string = " "
            print (final_dataframe.columns.name)
for h in habitats:
            print(h)
    for c in index:
        if h in index.name: #checks if habitat is in column name
           print(h)
    if row[c] is not null:
            final_string == final_string + c.name + ", "
           print(final_string)

Upvotes: 1

Answers (2)

Алексей Р

Reputation: 7627

data_dict = {
    'Species_ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Habitat_A': ['X', '', 'X', 'X', '', 'X', 'X', '', 'X', ''],
    'Habitat B,C,&D': ['X', '', 'X', 'X', '', 'X', 'X', '', 'X', ''],
    'Habitat_E': ['', 'X', '', 'X', 'X', '', 'X', 'X', '', 'X'],
}

df = pd.DataFrame.from_dict(data_dict)
df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: pd.Series([y[0] if y[1] == 'X' else '' for y in x.iteritems()]), axis=1)
df['All_habitats(CONCAT)'] = df.apply(lambda x: ','.join(filter(None, x[1:])), axis=1)
print(df)

Prints:

   Species_ID  Habitat_A  Habitat B,C,&D  Habitat_E                All_habitats(CONCAT)
0           1  Habitat_A  Habitat B,C,&D                       Habitat_A,Habitat B,C,&D
1           2                             Habitat_E                           Habitat_E
2           3  Habitat_A  Habitat B,C,&D                       Habitat_A,Habitat B,C,&D
3           4  Habitat_A  Habitat B,C,&D  Habitat_E  Habitat_A,Habitat B,C,&D,Habitat_E
4           5                             Habitat_E                           Habitat_E
5           6  Habitat_A  Habitat B,C,&D                       Habitat_A,Habitat B,C,&D
6           7  Habitat_A  Habitat B,C,&D  Habitat_E  Habitat_A,Habitat B,C,&D,Habitat_E
7           8                             Habitat_E                           Habitat_E
8           9  Habitat_A  Habitat B,C,&D                       Habitat_A,Habitat B,C,&D
9          10                             Habitat_E                           Habitat_E

Test on 2095 rows * 19 columns from .csv (dummy data)

import pandas as pd, time

tic = time.perf_counter()
df = pd.read_csv(r'c:\Users\Alex20\Documents\Habitats.csv')

df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: pd.Series([y[0] if y[1] == 'X' else '' for y in x.iteritems()]), axis=1)
df['All_habitats(CONCAT)'] = df.apply(lambda x: ','.join(filter(None, x[1:])), axis=1)
print(df)

print(f"Processed in {time.perf_counter() - tic:0.4f} seconds")

Output:

      Species_ID  ...                               All_habitats(CONCAT)
0              1  ...  HabitatA,HabitatB,HabitatC,HabitatD,HabitatF,H...
1              2  ...                         HabitatC,HabitatG,HabitatP
2              3  ...  HabitatA,HabitatB,HabitatC,HabitatE,HabitatG,H...
3              4  ...  HabitatA,HabitatB,HabitatE,HabitatJ,HabitatL,H...
4              5  ...  HabitatD,HabitatI,HabitatK,HabitatL,HabitatM,H...
...          ...  ...                                                ...
2090        2091  ...  HabitatA,HabitatB,HabitatE,HabitatF,HabitatG,H...
2091        2092  ...  HabitatA,HabitatB,HabitatC,HabitatE,HabitatF,H...
2092        2093  ...  HabitatB,HabitatC,HabitatD,HabitatG,HabitatH,H...
2093        2094  ...  HabitatC,HabitatF,HabitatG,HabitatI,HabitatK,H...
2094        2095  ...  HabitatB,HabitatE,HabitatG,HabitatI,HabitatK,H...

[2095 rows x 19 columns]
Processed in 0.4257 seconds

.csv

Upvotes: 2

Caleb Sprague

Reputation: 153

Here is a solution...

import pandas as pd
import numpy as np

data = {'species_id':[1,2,3],
            'Habitat_A':['x',np.nan,'x'],
            'Habitat_B':['x', np.nan,np.nan],
        'Habitat_C':['x', 'x',np.nan],
    }
    
df = pd.DataFrame(data)
display(df)
for col in df.columns:
    df[col] = df[col].replace('x', col+ ' ')
    df[col] = df[col].replace(np.nan, '')
df['All_Habitats']= df['Habitat_A'].astype(str) + df['Habitat_B'].astype(str)  + df['Habitat_C'].astype(str)
display(df)

Output:

    species_id  Habitat_A   Habitat_B   Habitat_C   All_Habitats
0   1           Habitat_A   Habitat_B   Habitat_C   Habitat_A Habitat_B Habitat_C
1   2                                   Habitat_C   Habitat_C
2   3           Habitat_A                           Habitat_A

Upvotes: 0

How to assign column heading to corresponding cell values?

Answers (2)

Test on 2095 rows * 19 columns from .csv (dummy data)

Related Questions