Reputation: 311
I know in R I can use tidyr for the following:
data_wide <- spread(data_protein, Fraction, Count)
and data_wide will inherit all the columns from data_protein that are not spread.
Protein Peptide Start Fraction Count
1 A 122 F1 1
1 A 122 F2 2
1 B 230 F1 3
1 B 230 F2 4
becomes
Protein Peptide Start F1 F2
1 A 122 1 2
1 B 230 3 4
But in pandas (Python),
data_wide = data_prot2.reset_index(drop=True).pivot('Peptide','Fraction','Count').fillna(0)
doesn't inherit anything not specified in the function (index, key, value). Thus, I decided to join it through df.join():
data_wide2 = data_wide.join(data_prot2.set_index('Peptide')['Start']).sort_values('Start')
But that produces duplicates of the peptides because there are several start values. Is there any more straightforward way to solve this? Or a special parameter for join that omits repeats? Thank you in advance.
Upvotes: 6
Views: 3090
Reputation: 3835
spread
is superseded by pivot_wider
in tidyr
.
How about using datar
that follows tidyr
's API design:
>>> from datar.all import f, tribble, pivot_wider
>>> data_protein = tribble(
... f.Protein, f.Peptide, f.Start, f.Fraction, f.Count,
... 1, "A", 122, "F1", 1,
... 1, "A", 122, "F2", 2,
... 1, "B", 230, "F1", 3,
... 1, "B", 230, "F2", 4,
... )
>>> data_wide = pivot_wider(data_protein, names_from=f.Fraction, values_from=f.Count)
>>> data_wide
Peptide Protein Start F1 F2
0 A 1 122 1 2
1 B 1 230 3 4
I am the author of the package. Feel free to submit issues if you have any questions.
Upvotes: 0
Reputation: 294498
Using stack
:
df.set_index(df.columns[:4].tolist()) \
.Count.unstack().reset_index() \
.rename_axis(None, axis=1)
Upvotes: 1
Reputation: 210922
try this:
In [144]: df
Out[144]:
Protein Peptide Start Fraction Count
0 1 A 122 F1 1
1 1 A 122 F2 2
2 1 B 230 F1 3
3 1 B 230 F2 4
In [145]: df.pivot_table(index=['Protein','Peptide','Start'], columns='Fraction').reset_index()
Out[145]:
Protein Peptide Start Count
Fraction F1 F2
0 1 A 122 1 2
1 1 B 230 3 4
you can also specify Count
column explicitly:
In [146]: df.pivot_table(index=['Protein','Peptide','Start'], columns='Fraction', values='Count').reset_index()
Out[146]:
Fraction Protein Peptide Start F1 F2
0 1 A 122 1 2
1 1 B 230 3 4
Upvotes: 4