Reputation: 47
Need to assign a particular id for each dataframe matching the below conditions
fd = df[(df['B'].str.match('.*Color:.*') | df['B'].str.match('.*colorFUL:.*')) & df.A.isnull()]
fd2 = df[(df['B'].str.match('.*Type:.*')) & df.A.isnull()]
In the output file, both dataframes are getting written one below other. Need to add a column C where ID '1' is assigned to fd and ID '2' is assigned to fd2. This would be helpful in filtering the dataframes.
This is the current output
A B
nan this has Color:Red
nan Color: Blue,red, green
nan Color: Yellow
nan This has many colors. Color: green, red, Yellow
nan Filter oil Type: Synthetic Motor oil
nan Oil Type : High Mileage Motor oil
Expected Output
A B C
nan this has Color:Red 1
nan Color: Blue,red, green 1
nan Color: Yellow 1
nan This has many colors. Color: green, red, Yellow 1
nan Filter oil Type: Synthetic Motor oil 2
nan Oil Type : High Mileage Motor oil 2
Upvotes: 0
Views: 103
Reputation: 23753
Adding a new column C and assigning ID '1' or '2' to that column based on the dataframe matching the regex.
In [17]: df
Out[17]:
A B
0 NaN this has Color:Red
1 NaN Color: Blue,red, green
2 NaN Color: Yellow
3 NaN This has many colors. Color: green, red, Yellow
4 NaN Filter oil Type: Synthetic Motor oil
5 NaN Oil Type : High Mileage Motor oil
You constructed two conditions:
In [18]: one = (df['B'].str.match('.*Color:.*') | df['B'].str.match('.*colorFUL:.*')) & df.A.isnull()
In [19]: one
Out[19]:
0 True
1 True
2 True
3 True
4 False
5 False
dtype: bool
In [20]: two = (df['B'].str.match('.*Type:.*')) & df.A.isnull()
In [21]: two
Out[21]:
0 False
1 False
2 False
3 False
4 True
5 False
dtype: bool
Here is one way to make a new column.
In [22]: df['C'] = 0
Use the boolean series of your conditions to assign values based on those conditions.
In [23]: df.loc[one,'C'] = 1
In [24]: df.loc[two,'C'] = 2
In [25]: df
Out[25]:
A B C
0 NaN this has Color:Red 1
1 NaN Color: Blue,red, green 1
2 NaN Color: Yellow 1
3 NaN This has many colors. Color: green, red, Yellow 1
4 NaN Filter oil Type: Synthetic Motor oil 2
5 NaN Oil Type : High Mileage Motor oil 0
if df is the input dataframe and fd is the output dataframe matching the pattern, how to directly assign an id to fd without the boolean check
fd = df.loc[one]
fd['C'] = 1
Upvotes: 1