DHJ
DHJ

Reputation: 621

How to split a string in a pandas dataframe, and return multiple dataframes

I have a pandas dataframe containing strings:

df = pd.DataFrame({'column1': ['One_Two_Three', 'First_Second_Third', 'nrOne_nrTwo_nrThree'], 'column2': ['nrOne_nrTwo_nrThree', 'First_Second_Third', 'One_Two_Three'], 'column3': ['First_Second_Third', 'One_Two_Three', 'nrOne_nrTwo_nrThree'],})
Out[0]: df 
               column1              column2              column3
0        One_Two_Three  nrOne_nrTwo_nrThree   First_Second_Third
1   First_Second_Third   First_Second_Third        One_Two_Three
2  nrOne_nrTwo_nrThree        One_Two_Three  nrOne_nrTwo_nrThree

I would like to end up with three dataframes, so that the first one contain the characters before the first underscore, the second one before the second underscore and the third contain the last part. For the first like:

    df_one
    Out[1]: 
               column1              column2              column3
0              One                  nrOne                First
1              First                First                One
2              nrOne                One                  nrOne

I've tried

df_temp = df.apply(lambda x: x.str.split('_'))

df_temp
Out[2]: 
                   column1                  column2                  column3
0        [One, Two, Three]  [nrOne, nrTwo, nrThree]   [First, Second, Third]
1   [First, Second, Third]   [First, Second, Third]        [One, Two, Three]
2  [nrOne, nrTwo, nrThree]        [One, Two, Three]  [nrOne, nrTwo, nrThree]

To split it into lists and

df_temp.apply(lambda x: x[0])
Out[3]: 
  column1  column2 column3
0     One    nrOne   First
1     Two    nrTwo  Second
2   Three  nrThree   Third

But this ends up affecting only the first row. Anyone who have a solution?

Upvotes: 2

Views: 45

Answers (1)

DHJ
DHJ

Reputation: 621

One solution is to use applymap:

df_temp.applymap(lambda x: x[0])
Out[0]: 
  column1 column2 column3
0     One   nrOne   First
1   First   First     One
2   nrOne     One   nrOne

Another is to use apply on a Series, by stacking and unstacking:

df_temp.stack().apply(lambda x: x[0]).unstack()
Out[0]: 
  column1 column2 column3
0     One   nrOne   First
1   First   First     One
2   nrOne     One   nrOne

Upvotes: 1

Related Questions