Reputation: 674
Below I am trying to identify the type of the columns based on my if-statement:
Code:
import pandas as pd
import numpy as np
df2 = pd.read_csv("https://data.calgary.ca/api/views/848s-4m4z/rows.csv?accessType=DOWNLOAD&bom=true&format=true")
df2 = df2.iloc[:, 0:3]
num_cols = list(df2.select_dtypes(include=[np.number]).columns.values)
categorical_text_cols = list(df2.select_dtypes(exclude=[np.number]).columns.values)
for x in df2.columns:
if x in categorical_text_cols and df2[x].str.len().mean() >= 50:
print('categorical')
elif x in categorical_text_cols and df2[x].str.len().mean() <= 50:
print('text')
elif x in num_cols:
print('numerical')
output:
text
text
text
I am having a hard time figuring out how can I store results in the form of the original dataset?
Desired output (in a data frame)
Sector Community Name Group Category
text text text
Upvotes: 0
Views: 113
Reputation: 1604
You can store the results of your if structure in a list and then create a dataframe
# create emtpy list to memorize the results
mem = []
for x in df2.columns:
if x in categorical_text_cols and df2[x].str.len().mean() >= 50:
mem.append('categorical')
elif x in categorical_text_cols and df2[x].str.len().mean() <= 50:
# put result in list
mem.append('text')
elif x in num_cols:
mem.append('numerical')
# put results in dataframe
df3 = pd.DataFrame([mem], columns=df2.columns)
Upvotes: 1
Reputation: 356
I am assuming your loop will run the same number of times that you have number of columns. Try this:
init_df = {}
for x in df2.columns:
if x in categorical_text_cols and df2[x].str.len().mean() >= 50:
init_df[x] = 'categorical'
elif x in categorical_text_cols and df2[x].str.len().mean() <= 50:
init_df[x] = 'text'
elif x in num_cols:
init_df[x] = 'numerical'
And then do:
print(pd.DataFrame([init_df]))
Upvotes: 1