Reputation: 804
I'm making a test-team report from excel input; using pandas to gather, filter, process data.
I made below code to make product-testcase cover table for later use/easy search. With 3rd column being type of test case. I have multiple testcases inside one excel so I need to go through all of cells and split tests to make pairs product - test case.
Because I'm not much familiar with pandas and I haven't found better way elsewhere I would like to ask if there is more pythonic way or easier in pandas way to do the same and more efficient.
code with example data ( \n is newline inside excel cell):
df = pd.DataFrame({"prod":["TS001","TS002"],
"activate":["001_002\n001_004", "003_008\n024_080"],
"deactivate":["004_005\n006_008", "001_008"]})
df = df.set_index("prod")
list_of_tuples = []
for i, row in df.iterrows():
for cell in row.iteritems():
for test in cell[-1].splitlines():
list_of_tuples.append((i, test, cell[0])) # [(product, test, category)..]
return_df = pd.DataFrame(list_of_tuples, columns=('prod', 'testcase', 'category'))
producing:
prod testcase category
0 TS001 001_002 activate
1 TS001 001_004 activate
2 TS001 004_005 deactivate
3 TS001 006_008 deactivate
4 TS002 003_008 activate
5 TS002 024_080 activate
6 TS002 001_008 deactivate
Thank you for any suggestions.
Upvotes: 3
Views: 107
Reputation: 294506
With a comprehension
pd.DataFrame(
[(p, t, c) for (p, c), r in df.stack().items() for t in r.split()],
columns=['prod', 'testcase', 'category']
)
prod testcase category
0 TS001 001_002 activate
1 TS001 001_004 activate
2 TS001 004_005 deactivate
3 TS001 006_008 deactivate
4 TS002 003_008 activate
5 TS002 024_080 activate
6 TS002 001_008 deactivate
Explanation
df.stack()
prod
TS001 activate 001_002\n001_004
deactivate 004_005\n006_008
TS002 activate 003_008\n024_080
deactivate 001_008
dtype: object
When iterating over df.stack().items()
, we get tuples with index value as the first element and value as the second element. Because we stacked, the index value itself is a tuple. So the first pair looks like:
(('TS001', 'activate'), '001_002\n001_004')
Through subsequent iteration over '001_002\n001_004'.split()
and rearranging the unpacked elements, we get
[(p, t, c) for (p, c), r in df.stack().items() for t in r.split()]
[('TS001', '001_002', 'activate'),
('TS001', '001_004', 'activate'),
('TS001', '004_005', 'deactivate'),
('TS001', '006_008', 'deactivate'),
('TS002', '003_008', 'activate'),
('TS002', '024_080', 'activate'),
('TS002', '001_008', 'deactivate')]
I then wrap this in a pd.DataFrame
constructor where I name the columns.
Upvotes: 4
Reputation: 77027
Here's one way and there could be better ways of doing it. Check piRSquared's - should be the most efficient among the ones here.
In [2807]: (df.set_index('prod')
.applymap(lambda x: x.split('\n'))
.stack()
.apply(pd.Series)
.stack()
.reset_index(name='testcase')
.rename(columns={'level_1': 'category'})
.drop('level_2', 1))
Out[2807]:
prod category testcase
0 TS001 activate 001_002
1 TS001 activate 001_004
2 TS001 deactivate 004_005
3 TS001 deactivate 006_008
4 TS002 activate 003_008
5 TS002 activate 024_080
6 TS002 deactivate 001_008
Details
In [2809]: df
Out[2809]:
activate deactivate prod
0 001_002\n001_004 004_005\n006_008 TS001
1 003_008\n024_080 001_008 TS002
Upvotes: 1
Reputation: 863501
Use:
set_index
like your solutionapply
for working with multiple columns with lambda function for DataFrame
by split
and expand=True
and reshape by stack
unstack
for columns to Multiindex
reset_index
first for remove level and then for Multiindex
to columnsreindex_axis
for change order of columnsdf = (df.set_index('prod')
.apply(lambda x: x.str.split('\n', expand=True).unstack())
.stack()
.reset_index(level=0, drop=True)
.reset_index(name='testcase')
.rename(columns={'level_1':'category'})
.reindex_axis(['prod','testcase','category'], axis=1))
print (df)
prod testcase category
0 TS001 001_002 activate
1 TS001 004_005 deactivate
2 TS002 003_008 activate
3 TS002 001_008 deactivate
4 TS001 001_004 activate
5 TS001 006_008 deactivate
6 TS002 024_080 activate
Upvotes: 2
Reputation: 403050
Using df.applymap
, df.melt
, and df.stack
df = df.applymap(str.split).reset_index().melt('prod', \
['activate', 'deactivate']).set_index(['prod', 'variable'])
df = pd.DataFrame(df.value.tolist(), index=df.index)\
.stack().reset_index().drop('level_2', 1)
df.columns = ['prod', 'category', 'testcase']
df
prod category testcase
0 TS001 activate 001_002
1 TS001 activate 001_004
2 TS002 activate 003_008
3 TS002 activate 024_080
4 TS001 deactivate 004_005
5 TS001 deactivate 006_008
6 TS002 deactivate 001_008
Upvotes: 1