Reputation: 177
I have a data frame like this:
Row Author Cit_Handle Year Title Handle
1 Carlos Hi 2017 how to be ReP:55:er45
2 Boris Sla 2018 what it it? ReP:ef5:ag4g
3 Dante Ur 2017 is it true? ReP:f9gj:sfona9:2039
4 ReP:fb9:d93
5 Jure Les 2016 ¡it is true! ReP:odjva:ejewojaef:advon
6 Mark Cas 2018 How do ReP:apnvb:qt42rwb:203
7 ReP:gjh:59f
I want paste each Cit_Handle
value from the row above it until it find another Cit_Handle
value or the column name, like this:
Row Author Cit_Handle Year Title Handle
1 Carlos Hi ReP:fb9:d93 2017 how to be ReP:55:er45
2 Boris Sla ReP:fb9:d93 2018 what it it? ReP:ef5:ag4g
3 Dante Ur ReP:fb9:d93 2017 is it true? ReP:f9gj:sfona9:2039
4 Jure Les ReP:gjh:59f 2016 ¡it is true! ReP:odjva:ejewojaef:advon
5 Mark Cas ReP:gjh:59f 2018 How do ReP:apnvb:qt42rwb:203
If you want to see a sample of the real data you can see it here.
Any idea how can I do it?
Upvotes: 1
Views: 39
Reputation: 28313
The output that you describe can be achieved with a backfill on Cit_Handle
& subsequently removing rows where any of the other fields are empty.
the code on the line In [5]:
does all the processing.
In [1]: import pandas as pd
In [2]: text ="""Author,Cit_Handle,Year,Title,Handle
...: Carlos Hi,,2017,how to be,ReP:55:er45
...: Boris Sla,,2018,what it it?,ReP:ef5:ag4g
...: Dante Ur,,2017,is it true?,ReP:f9gj:sfona9:2039
...: ,ReP:fb9:d93,,,
...: Jure Les,,2016,¡it is true!,ReP:odjva:ejewojaef:advon
...: Mark Cas,,2018,How do,ReP:apnvb:qt42rwb:203
...: ,ReP:gjh:59f,,,"""
In [3]: from io import StringIO
In [4]: df = pd.read_csv(StringIO(text),sep=',')
In [5]: df.fillna(method='bfill')[df.Author.notnull()]
Out[5]:
Author Cit_Handle Year Title Handle
0 Carlos Hi ReP:fb9:d93 2017.0 how to be ReP:55:er45
1 Boris Sla ReP:fb9:d93 2018.0 what it it? ReP:ef5:ag4g
2 Dante Ur ReP:fb9:d93 2017.0 is it true? ReP:f9gj:sfona9:2039
4 Jure Les ReP:gjh:59f 2016.0 ¡it is true! ReP:odjva:ejewojaef:advon
5 Mark Cas ReP:gjh:59f 2018.0 How do ReP:apnvb:qt42rwb:203
One tiny note: The int
type in pandas can't contain NaN
s, thus in this process the Year
column is upcast to float
.
Upvotes: 1