Xin
Xin

Reputation: 674

Print results of if-statement into original data frame

Below I am trying to identify the type of the columns based on my if-statement:

Code:

import pandas as pd
import numpy as np

df2 = pd.read_csv("https://data.calgary.ca/api/views/848s-4m4z/rows.csv?accessType=DOWNLOAD&bom=true&format=true")
df2 = df2.iloc[:, 0:3]

num_cols = list(df2.select_dtypes(include=[np.number]).columns.values)
categorical_text_cols = list(df2.select_dtypes(exclude=[np.number]).columns.values)
for x in df2.columns:
    if x in categorical_text_cols and df2[x].str.len().mean() >= 50:
        print('categorical')
    elif x in categorical_text_cols and df2[x].str.len().mean() <= 50:
        print('text')
    elif x in num_cols:
        print('numerical')

output:

text
text
text

I am having a hard time figuring out how can I store results in the form of the original dataset?

Desired output (in a data frame)

Sector  Community Name  Group Category
text    text            text

Upvotes: 0

Views: 113

Answers (2)

drops
drops

Reputation: 1604

You can store the results of your if structure in a list and then create a dataframe

# create emtpy list to memorize the results
mem = []

for x in df2.columns:
    if x in categorical_text_cols and df2[x].str.len().mean() >= 50:
        mem.append('categorical')
    elif x in categorical_text_cols and df2[x].str.len().mean() <= 50:
        # put result in list
        mem.append('text')
    elif x in num_cols:
        mem.append('numerical')

# put results in dataframe
df3 = pd.DataFrame([mem], columns=df2.columns)

Upvotes: 1

Farid Jafri
Farid Jafri

Reputation: 356

I am assuming your loop will run the same number of times that you have number of columns. Try this:

init_df = {}
for x in df2.columns:
       if x in categorical_text_cols and df2[x].str.len().mean() >= 50:
           init_df[x] = 'categorical'
       elif x in categorical_text_cols and df2[x].str.len().mean() <= 50:
           init_df[x] = 'text'
       elif x in num_cols:
           init_df[x] = 'numerical'

And then do:

print(pd.DataFrame([init_df]))

Upvotes: 1

Related Questions