Split a text(with names and values) column into multiple columns in Pandas DataFrame

Question

I have problem with speed of my algorithm, is too slow. I have a big dataframe and wanna create columns depends on the name and value in other. I am looking for a solution maybe in Pandas. Before running I don't know the size of the future columns. Here is a simple schema.

"column"<==>"value"
"column"<==> "value"
...

my data frame

id |     params     |
---|-----------------
0  |currency<=>PLN
price<=>72.14
city<==>Berlin
---|-----------------
1  |price<=>90
area<=>72.14
city<==>San Francisco
rooms<==>2
is_Free<==>1
---|-----------------

And i would like to have something like this

   id | price | currency |      city    | rooms | is_Free| area|
   ---|------ |----------|--------------|-------|--------|------
     0| 72.14 |  PLN     |     Berlin   |  NaN  |   NaN  |  NaN|
   ---|-------|----------|--------------|-------|--------|------
     1|  90   |  NaN     | San Francisco|   2   |    1   |  90 |

My solution:

def add_parameters(df):
    for i,row in df.iterrows():
        parameters_list = row.params.split("
")
        for parameter in parameters_list:
            elem_list = parameter.split("<=>")
            if elem_list[0]  and elem_list[1] != '':
                df.loc[i, elem_list[0]] = elem_list[1]
    return df

Thanks

sushanth · Accepted Answer

This is one way of approaching the problem.

import re

# handle multiple seperator.
sep = re.compile(r"(<.*>)")


def split(value):
    ret = {}
    for s in value.split("
"):
        # search if seperator exists in the string & split based on sep.
        if sep.search(s):
            split_ = s.split(sep.search(s).group())
            ret[split_[0]] = split_[1]

    return ret

print(df['params'].apply(lambda x : split(x)).apply(pd.Series))

Output

  currency  price           city   area rooms is_Free
0      PLN  72.14         Berlin    NaN   NaN     NaN
1      NaN     90  San Francisco  72.14     2       1

Split a text(with names and values) column into multiple columns in Pandas DataFrame

Answers (2)

Related Questions