Reputation: 31
I am trying to retrive the results to a lab from a provided excel. Unfortunately, the excel data is structured as such:
df = pd.DataFrame({'Tests':['a\nb\nc', 'b\nc\na'], 'Results':['1.2\n3.4\n4', '2.3\n1\n1']})
df
Tests Results
0 a\nb\nc 1.2\n3.4\n4
1 b\nc\na 2.3\n1\n1
I want to be able to return the result that matches test 'a':
Tests Results a result
0 a\nb\nc 1.2\n3.4\n4 1.2
1 b\nc\na 2.3\n1\n1 1
I know I can separate the different lines in the test and/or result column using
tests = df['Tests'].str.split('\n', expand=True)
When I try to apply that into a for loop (I hate for-loops too but can't think of a better method for this) I am not successful:
for row in df:
tests = df['Tests'].str.split('\n', expand=True)
if tests[0] == 'a':
df['result'] = tests[0]
#...and on through each possible row of col of tests expanded
I'm confident a better way exists, any direction is greatly appreciated.
Upvotes: 0
Views: 66
Reputation: 35686
We can try concat
after expanding all columns in df
then do some reshaping with pivot
(assuming each test value appears at most once per cell) to create a reasonable lookup table:
lookup_df = (
pd.concat(
[df[col].str.split('\n', expand=True) for col in df],
keys=df.columns,
axis=1
)
.rename_axis('index')
.stack()
.reset_index()
.pivot(index='index', columns='Tests', values='Results')
.add_prefix('results_') # Optional Column Prefix
.astype('float64') # If wanting Numeric values
)
Tests results_a results_b results_c
index
0 1.2 3.4 4
1 1 2.3 1
This can be join
back to df
:
df = df.join(lookup_df)
df
:
Tests Results results_a results_b results_c
0 a\nb\nc 1.2\n3.4\n4 1.2 3.4 4
1 b\nc\na 2.3\n1\n1 1 2.3 1
Alternatively, assuming the expanded tests and results are guaranteed to have the same dimensions when split and expanded. And we're only looking for one of the test results we can use numpy indexing to find the Results and create the column:
tests = df['Tests'].str.split('\n', expand=True)
results = df['Results'].str.split('\n', expand=True)
df['results_a'] = results.to_numpy()[tests.eq('a').to_numpy()]
df
:
Tests Results results_a
0 a\nb\nc 1.2\n3.4\n4 1.2
1 b\nc\na 2.3\n1\n1 1
Upvotes: 1