Reputation: 181
Consider I have a dataframe:
a = [['A','def',2,3],['B|C','xyz|abc',56,3],['X|Y|Z','uiu|oi|kji',65,34],['K','rsq',98,12]]
df1 = pd.DataFrame(a, columns=['1', '2','3','4'])
df1
1 2 3 4
0 A def 2 3
1 B|C xyz|abc 56 3
2 X|Y|Z uiu|oi|kji 65 34
3 K rsq 98 12
First, how do I print all the rows that has "|" in column 1? I am trying the following but it prints all rows of the frame:
df1[df1[1].str.contains("|")]
Second, how do I split the column 1 and column 2 on "|", so that each split in column 1 gets its corresponding split from column 2 and the rest of the data is appended to each split. For example, I want something like this from df1:
1 2 3 4
0 A def 2 3
1 B xyz 56 3
2 C abc 56 3
3 X uiu 65 34
4 Y oi 65 34
5 Z kji 65 34
6 K rsq 98 12
Upvotes: 3
Views: 142
Reputation: 862611
You can use custom lambda function with Series.str.split
and Series.explode
for columns specified in list and then add all another columns in DataFrame.join
:
splitter = ['1','2']
cols = df1.columns.difference(splitter)
f = lambda x: x.str.split('|').explode()
df1 = df1[splitter].apply(f).join(df1[cols]).reset_index(drop=True)
print (df1)
1 2 3 4
0 A def 2 3
1 B xyz 56 3
2 C abc 56 3
3 X uiu 65 34
4 Y oi 65 34
5 Z kji 65 34
6 K rsq 98 12
For filter by |
what is special regex character or
add regex=False
to Series.str.contains
:
print(df1[df1[1].str.contains("|" regex=False)])
Or escape it by \|
:
print(df1[df1[1].str.contains("\|")])
Upvotes: 3