K2A_Dev
K2A_Dev

Reputation: 45

Rebuild pandas Dataframe

I'm pretty new to Python. I'm having trouble with reconstructing a bad Dataframe. My Dataframe looks like this:

df = pd.DataFrame({'col1': ['id 1', 'id 2', 'test 3', 'test 4'],
               'col2': ['test 1', 'test 2',
                        'Number 6135', 'id 4'],
               'col3': ['Number 12344', 'Number 21612','id 3','Number 1131']})

enter image description here

As you can See I have the "id´s" in the first, second and third col. The same for the other Values.

My Goal is to have a Dataframe in which each Col has only the Values beginning with the Same substring like this example:

enter image description here

I already tried using for Loops and if statements to check if a certain Substring is in my String. if that's True I create a new Column and Write my Value inside it

for x in df['col1']:
if 'id' in x:
    df['newCol']=x

I know this code will never solve my Problem, maybe there is some pandas Function which can help me to solve this Problem

If something is not clear, just ask me I'll try my best to explain my Problem.

Upvotes: 3

Views: 477

Answers (4)

jezrael
jezrael

Reputation: 863291

If possible simplify solution by split values by first blank:

df = (df.reset_index()
        .melt('index')
        .assign(new = lambda x: x['value'].str.split().str[0])
        .pivot('index','new','value'))
print (df)
new          Number    id    test
index                            
0      Number 12344  id 1  test 1
1      Number 21612  id 2  test 2
2       Number 6135  id 3  test 3
3       Number 1131  id 4  test 4

Else you can use Series.str.extract:

L = ['id','test','Number']

df = (df.reset_index()
        .melt('index')
        .assign(new = lambda x: x['value'].str.extract(f'({"|".join(L)})', expand=False))
        .pivot('index','new','value'))
print (df)
new          Number    id    test
index                            
0      Number 12344  id 1  test 1
1      Number 21612  id 2  test 2
2       Number 6135  id 3  test 3
3       Number 1131  id 4  test 4

Upvotes: 3

sammywemmy
sammywemmy

Reputation: 28709

You could first melt the dataframe, then use numpy select to reorder the names, and finally pivot :

(df.melt(ignore_index=False)
   .assign(variable=lambda x: np.select([x.value.str.startswith("id"),
                                          x.value.str.startswith("test"),
                                          x.value.str.startswith("Number")],
                                          ["col1", "col2", "col3"]))
    .reset_index()
    .pivot("index", "variable", "value")
    .rename_axis(columns=None, index=None))


    col1    col2    col3
0   id 1    test 1  Number 12344
1   id 2    test 2  Number 21612
2   id 3    test 3  Number 6135
3   id 4    test 4  Number 1131

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195553

You can use sorted() with custom key function:

def key_fn(x):
    if 'id' in x:
        return 0
    if 'test' in x:
        return 1
    if 'Number' in x:
        return 2
    return 3 

df = df.apply(lambda x: pd.Series(sorted(x, key=key_fn)), axis=1)
df = df.rename(columns=lambda x: 'col{}'.format(x+1))
print(df)

Prints:

   col1    col2          col3
0  id 1  test 1  Number 12344
1  id 2  test 2  Number 21612
2  id 3  test 3   Number 6135
3  id 4  test 4   Number 1131

Another version, from the comments:

df = pd.DataFrame([sorted(l, key=key_fn) for l in df.values], columns=df.columns)
print(df)

Upvotes: 3

Andy L.
Andy L.

Reputation: 25259

Try this:

s = df.melt()['value']
df_final = pd.DataFrame({x: s[s.str.startswith(x)].values 
                                        for x in s.str.split().str[0].unique()})

Out[27]:
     id    test        Number
0  id 1  test 3   Number 6135
1  id 2  test 4  Number 12344
2  id 4  test 1  Number 21612
3  id 3  test 2   Number 1131

Upvotes: 0

Related Questions