Reputation: 704
I have a text file1 with
col0 col1
g1 text
g2 text,text
g3 text,text,text
g4 text
g5 text,text,text,text,text
need to modify it using pandas to remove all rows with multiple text output should look like this
col0 col1
g1 text
g4 text
only difference i have files which have ~300,000 rows in total
Upvotes: 2
Views: 73
Reputation: 294488
This answer was based on @MaxU's concept, but this adds a layer of generalization enabling you to change the condition of how many text
values are allowed.
df[df.col1.str.count(',') < 1]
col0 col1
0 g1 text
3 g4 text
Upvotes: 2
Reputation: 210882
If col1
contains flat strings:
In [94]: df
Out[94]:
col0 col1
0 g1 text
1 g2 text,text
2 g3 text,text,text
3 g4 text
4 g5 text,text,text,text,text
In [95]: df = df.loc[~df.col1.str.contains(',')]
In [96]: df
Out[96]:
col0 col1
0 g1 text
3 g4 text
In [105]: df
Out[105]:
col0 col1
0 g1 [text]
1 g2 [text, text]
2 g3 [text, text, text]
3 g4 [text]
4 g5 [text, text, text, text, text]
In [106]: df.col1.str.len() < 2
Out[106]:
0 True
1 False
2 False
3 True
4 False
Name: col1, dtype: bool
In [107]: df[df.col1.str.len() < 2]
Out[107]:
col0 col1
0 g1 [text]
3 g4 [text]
Upvotes: 3