HAH
HAH

Reputation: 211

Splitting text of a single row into multiple rows of the same column in a CSV file using Python

The dictionary has the following key-value pairs:

    {
    'Target_Tab': 'employees',
    ' Target_Col': 'empp_id  last_name  first_name',
    'Source_Col': 'emp_id    l_name    f_name',
    'Source_Tab': 'employee'
    }

I'm writing this dictionary into a CSV file and so far I've got this:

Source_Tab   Source_Col                   Target_Tab    Target_Col
employee     emp_id last_name first_name   employees    empp_id l_name f_name 

I want to write the Source _col and Target_col values in different rows. Se below is what I need:

Source_Tab  Source_Col   Target_Tab  Target_Col
employee    emp_id       employees   empp_id
            last_name                l_name
            first_name               f_name 

My code is as follows:

import pandas as pd
d = [sdict]
d2 = []
col = ["Source_Table","Source_Columns","Target_Table","Target_Columns"]
for i in d:
    temp = {}
    for c in col:
        if c in i:
            temp[c] = i[c]
        else:
            temp[c] = ''
    d2.append(temp)
df2 = pd.DataFrame(d2, columns=col)
df2.to_csv('test21.csv', index=False)

Upvotes: 2

Views: 324

Answers (1)

jezrael
jezrael

Reputation: 862711

Use list comprehension with split for list of Series and join together by concat, last replace missing values by DataFrame.fillna and change order of columns by list col:

d =  {
    'Target_Tab': 'employees',
    'Target_Col': 'empp_id  last_name  first_name',
    'Source_Col': 'emp_id    l_name    f_name',
    'Source_Tab': 'employee'
    }

col = ["Source_Tab","Source_Col","Target_Tab","Target_Col"]
df = pd.concat([pd.Series(v.split(), name=k) for k, v in d.items()], axis=1).fillna('')[col]
print (df)
  Source_Tab Source_Col Target_Tab  Target_Col
0   employee     emp_id  employees     empp_id
1                l_name              last_name
2                f_name             first_name

Another solution:

col = ["Source_Tab","Source_Col","Target_Tab","Target_Col"]
df = pd.Series(d).str.split(expand=True).fillna('').reindex(col).T
print (df)
  Source_Tab Source_Col Target_Tab  Target_Col
0   employee     emp_id  employees     empp_id
1                l_name              last_name
2                f_name             first_name

EDIT:

If need filter keys in source dictionary:

d =  {
    'Target_Tab': 'employees',
    'Target_Col': 'empp_id  last_name  first_name',
    'Source_Col': 'emp_id    l_name    f_name',
    'Source_Tab': 'employee'
    }

L = ['Source_Tab','Source_Col']
df = (pd.concat([pd.Series(v.split(), name=k) for k, v in d.items() if k in L], axis=1)
        .fillna(''))
print (df)
  Source_Col Source_Tab
0     emp_id   employee
1     l_name           
2     f_name           

Upvotes: 2

Related Questions