Vj_x49x
Vj_x49x

Reputation: 47

Assign a particular id for dataframe matching a regex pattern

Need to assign a particular id for each dataframe matching the below conditions

fd = df[(df['B'].str.match('.*Color:.*') | df['B'].str.match('.*colorFUL:.*')) & df.A.isnull()]
fd2 = df[(df['B'].str.match('.*Type:.*')) & df.A.isnull()]

In the output file, both dataframes are getting written one below other. Need to add a column C where ID '1' is assigned to fd and ID '2' is assigned to fd2. This would be helpful in filtering the dataframes.

This is the current output

A   B
nan this has Color:Red
nan Color: Blue,red, green
nan Color: Yellow
nan This has many colors. Color: green, red, Yellow
nan Filter oil Type: Synthetic Motor oil
nan Oil Type : High Mileage Motor oil

Expected Output

A   B   C
nan this has Color:Red  1
nan Color: Blue,red, green  1
nan Color: Yellow   1
nan This has many colors. Color: green, red, Yellow 1
nan Filter oil Type: Synthetic Motor oil    2
nan Oil Type : High Mileage Motor oil   2

Upvotes: 0

Views: 103

Answers (1)

wwii
wwii

Reputation: 23753

Adding a new column C and assigning ID '1' or '2' to that column based on the dataframe matching the regex.

In [17]: df
Out[17]: 
    A                                                B
0 NaN                               this has Color:Red
1 NaN                           Color: Blue,red, green
2 NaN                                    Color: Yellow
3 NaN  This has many colors. Color: green, red, Yellow
4 NaN             Filter oil Type: Synthetic Motor oil
5 NaN                Oil Type : High Mileage Motor oil

You constructed two conditions:

In [18]: one = (df['B'].str.match('.*Color:.*') | df['B'].str.match('.*colorFUL:.*')) & df.A.isnull()

In [19]: one
Out[19]: 
0     True
1     True
2     True
3     True
4    False
5    False
dtype: bool

In [20]: two = (df['B'].str.match('.*Type:.*')) & df.A.isnull()

In [21]: two
Out[21]: 
0    False
1    False
2    False
3    False
4     True
5    False
dtype: bool

Here is one way to make a new column.

In [22]: df['C'] = 0

Use the boolean series of your conditions to assign values based on those conditions.

In [23]: df.loc[one,'C'] = 1

In [24]: df.loc[two,'C'] = 2

In [25]: df
Out[25]: 
    A                                                B  C
0 NaN                               this has Color:Red  1
1 NaN                           Color: Blue,red, green  1
2 NaN                                    Color: Yellow  1
3 NaN  This has many colors. Color: green, red, Yellow  1
4 NaN             Filter oil Type: Synthetic Motor oil  2
5 NaN                Oil Type : High Mileage Motor oil  0

if df is the input dataframe and fd is the output dataframe matching the pattern, how to directly assign an id to fd without the boolean check

fd = df.loc[one]
fd['C'] = 1

Upvotes: 1

Related Questions