Reputation: 1066
I am using pandas to send data to a Redshift and in order to make sure that pandas is using the right VARCHAR lengths I am trying to create an auto mapping routine. I am doing this because without it I am getting an error that there is a string too long for VARCHAR(256).
here is my code to create the dict:
dtypes = df.dtypes.apply(lambda x: x.name).to_dict()
Here is the dict output:
{'IssueId': 'object', 'Title': 'object', 'Status': 'object', 'Severity': 'float64', 'NextStepAction': 'object', 'AssignedFolderLabel': 'object', 'Labels': 'object', 'RequesterIdentity': 'object', 'CreateDate': 'datetime64[ns, UTC]', 'LastUpdatedDate': 'datetime64[ns, UTC]', 'ResolvedDate': 'datetime64[ns, UTC]', 'AssigneeIdentity': 'object', 'Priority': 'float64', 'Rank': 'float64', 'PlanningEstimate': 'float64', 'EstimatedStartDate': 'datetime64[ns, UTC]', 'EstimatedCompletionDate': 'datetime64[ns, UTC]', 'ActualStartDate': 'datetime64[ns, UTC]', 'ActualCompletionDate': 'datetime64[ns, UTC]', 'NeedByDate': 'datetime64[ns, UTC]', 'IssueUrl': 'object', 'SubmitterIdentity': 'object', 'ResolvedByIdentity': 'object', 'region': 'object', 'cluster': 'object', 'az': 'object', 'site_id': 'object', 'requesting_org': 'object', 'request_type': 'object', 'requested_need_by_date': 'datetime64[ns, UTC]', 'fe_discipline': 'object', 'related_sim_link': 'object', 'playbook_link': 'object', 'fbn': 'object', 'design_sow_complete_date': 'datetime64[ns, UTC]', 'po_sent_to_vendor_date': 'datetime64[ns, UTC]', 'design_qc_checklist_uploaded': 'float64', 'peer_review_alias': 'object', 'as_built_submittal_rfi_uploaded_to_ds_ticket_link': 'object', 'total_number_of_positions': 'float64', 'reason_sla_was_not_met': 'object', 'will_this_feasibility_study_require_an_fe_design': 'float64', 'initial_kva_total': 'float64', 'fefr_fsb': 'object', 'eam_asset_id': 'float64', 'eam_equipment_class': 'object', 'internal_or_external': 'object', 'if_internal_select_one': 'object', 'if_external_is_the_vendor_onboarded': 'object', 'is_the_training_in_person_or_online': 'object', 'Snapshot': 'datetime64[ns]'}
Here is the code I was attempting to use to only keep the object types:
for k,v in dtypes:
if v != 'object':
dtypes.pop(k)
When I run the above it gives me ValueError: too many values to unpack (expected 2)
I can't figure out why, is it because of the date columns with the [] in them? I tried dtypes = df.dtypes.to_dict()
as well and it is the same error.
Upvotes: 1
Views: 179
Reputation: 88305
Seems like you could simplify the above to:
Object
type columns with select_dtypes
dtypes.to_dict()
df.select_dtypes('O').dtypes.to_dict()
Upvotes: 3
Reputation: 3519
.items
method to get key, value pair.for k,v in dtypes.items():
if v != 'object':
dtypes.pop(k)
Upvotes: 2