ar_mm18
ar_mm18

Reputation: 465

How to extract a part of the string to another column

I have a column that contains data like

Dummy data:

df = pd.DataFrame(["Lyreco A-Type small 2i",
"Lyreco C-Type small 4i",
"Lyreco N-Part medium", 
"Lyreco AKG MT 4i small",
"Lyreco AKG/ N-Type medium 4i",
"Lyreco C-Type medium 2i",
"Lyreco C-Type/ SNU medium 2i",
"Lyreco K-part small 4i",
"Lyreco K-Part medium", 
"Lyreco SNU small 2i",
"Lyreco C-Part large 2i",
"Lyreco N-Type large 4i"])

I want to create an extra column that strips the data and gives you the required part of the string(see below) in each row. The extracted column should look like this

Column_1                      Column_2
Lyreco A-Type small 2i         A-Type
Lyreco C-Type small 4i         C-Type
Lyreco N-Part medium           N-Part
Lyreco STU MT 4i small         STU MT
Lyreco AKG/ N-Type medium 4i   AKG/ N-Type
Lyreco C-Type medium 2i        C-Type
Lyreco C-Type/ SNU medium 2i   C-Type/ SNU
Lyreco K-part small 4i         K-part
Lyreco K-Part medium           K-Part
Lyreco SNU small 2i            SNU
Lyreco C-Part large 2i         C-Part
Lyreco N-Type large 4i         N-Type

How can I extract column 2 from the first column?

Upvotes: 4

Views: 1089

Answers (3)

alec_djinn
alec_djinn

Reputation: 10789

Looking at the example you posted, it's enough to split the column values and return "the middle" items. You can make a simple function to encapsulate the logic and apply it to the dataframe.

from math import floor

df = pd.DataFrame(
    {'Columns_1':
     ["Lyreco A-Type small 2i",
      "Lyreco C-Type small 4i",
      "Lyreco N-Part medium", 
      "Lyreco AKG MT 4i small",
      "Lyreco AKG/ N-Type medium 4i",
      "Lyreco C-Type medium 2i",
      "Lyreco C-Type/ SNU medium 2i",
      "Lyreco K-part small 4i",
      "Lyreco K-Part medium", 
      "Lyreco SNU small 2i",
      "Lyreco C-Part large 2i",
      "Lyreco N-Type large 4i"
     ]
    }
)


def f(row):
    blocks = row['Columns_1'].split()
    mid_index = 1 if len(blocks) <= 4 else floor(len(blocks)/2)
    return ' '.join(blocks[1:mid_index+1])

df['Columns_2'] = df.apply(f, axis=1)

print(df)

Output:

                       Columns_1    Columns_2
0         Lyreco A-Type small 2i       A-Type
1         Lyreco C-Type small 4i       C-Type
2           Lyreco N-Part medium       N-Part
3         Lyreco AKG MT 4i small       AKG MT
4   Lyreco AKG/ N-Type medium 4i  AKG/ N-Type
5        Lyreco C-Type medium 2i       C-Type
6   Lyreco C-Type/ SNU medium 2i  C-Type/ SNU
7         Lyreco K-part small 4i       K-part
8           Lyreco K-Part medium       K-Part
9            Lyreco SNU small 2i          SNU
10        Lyreco C-Part large 2i       C-Part
11        Lyreco N-Type large 4i       N-Type

Upvotes: 2

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520988

You might find that the following logic works with your data:

df["Column_2"] = df["Column_1"].str.extract(r'\w+ (\S+(?: \S+)*) \b(?:small|medium|large)\b')

The above pattern matches from the second term until reaching small, medium, or large keywords. Here is a working regex demo.

Upvotes: 3

eeannan
eeannan

Reputation: 1

df.columns = ['column_1']

df["column_2"] = [col.split(" ")[1] for col in df.column_1]

Upvotes: 0

Related Questions