��
���Lc@s}dZddkZddkZdgZddd��YZddd��YZdd
d��YZd eifd
��YZdS(s< robotparser.py
Copyright (C) 2000 Bastian Kleineidam
You can choose between two licenses when using this package:
1) GNU GPLv2
2) PSF license for Python 2.2
The robots.txt Exclusion Protocol is implemented as specified in
http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html
i����NtRobotFileParsercBsbeZdZdd�Zd�Zd�Zd�Zd�Zd�Zd�Z d �Z
d
�ZRS(ss This class provides a set of methods to read, parse and answer
questions about a single robots.txt file.
tcCs>g|_d|_t|_t|_|i|�d|_dS(Ni(tentriestNonet
default_entrytFalsetdisallow_allt allow_alltset_urltlast_checked(tselfturl((s!/usr/lib/python2.6/robotparser.pyt__init__s
cCs|iS(s�Returns the time the robots.txt file was last fetched.
This is useful for long-running web spiders that need to
check for new robots.txt files periodically.
(R (R
((s!/usr/lib/python2.6/robotparser.pytmtime scCsddk}|i�|_dS(sYSets the time the robots.txt file was last fetched to the
current time.
i����N(ttimeR (R
R((s!/usr/lib/python2.6/robotparser.pytmodified)scCs/||_ti|�dd!\|_|_dS(s,Sets the URL referring to a robots.txt file.iiN(Rturlparsethosttpath(R
R((s!/usr/lib/python2.6/robotparser.pyR1s cCs�t�}|i|i�}g}|D]}||i�q&~}|i�|i|_|idjo
t|_nF|idjo
t|_n)|idjo|o|i |�ndS(s4Reads the robots.txt URL and feeds it to the parser.i�i�i�i�N(i�i�(
t URLopenertopenRtstriptcloseterrcodetTrueRRtparse(R
topenertft_[1]tlinetlines((s!/usr/lib/python2.6/robotparser.pytread6s '
cCsEd|ijo!|idjo
||_qAn|ii|�dS(Nt*(t
useragentsRRRtappend(R
tentry((s!/usr/lib/python2.6/robotparser.pyt
_add_entryDscCs6d}d}t�}x�|D]�}|d7}|pQ|djot�}d}q�|djo |i|�t�}d}q�n|id�}|djo|| }n|i�}|pqn|idd�}t|�djo#|di�i�|d<ti|di��|d<|ddjoE|djo|i|�t�}n|i i
|d�d}q|ddjo8|djo'|ii
t|dt
��d}qq|ddjo8|djo'|ii
t|dt��d}qqqqW|djo|i|�nd S(
s�parse the input lines from a robots.txt file.
We allow that a user-agent: line is not preceded by
one or more blank lines.iiit#t:s
user-agenttdisallowtallowN(tEntryR$tfindRtsplittlentlowerturllibtunquoteR!R"t rulelinestRuleLineRR(R
Rtstatet
linenumberR#Rti((s!/usr/lib/python2.6/robotparser.pyRMsP
cCs�|iotS|iotStititi|��d�pd}x/|iD]$}|i |�o|i
|�SqTW|io|ii
|�StS(s=using the parsed robots.txt decide if useragent can fetch urlit/(RRRRR.tquoteRR/Rt
applies_tot allowanceR(R
t useragentRR#((s!/usr/lib/python2.6/robotparser.pyt can_fetch�s
,
cCs5dig}|iD]}|t|�dq~�S(NRs
(tjoinRtstr(R
RR#((s!/usr/lib/python2.6/robotparser.pyt__str__�s(t__name__t
__module__t__doc__RR
RRRR$RR:R=(((s!/usr/lib/python2.6/robotparser.pyRs 3 R1cBs)eZdZd�Zd�Zd�ZRS(soA rule line is a single "Allow:" (allowance==True) or "Disallow:"
(allowance==False) followed by a path.cCs>|djo|o
t}nti|�|_||_dS(NR(RR.R6RR8(R
RR8((s!/usr/lib/python2.6/robotparser.pyR�s
cCs |idjp|i|i�S(NR (Rt
startswith(R
tfilename((s!/usr/lib/python2.6/robotparser.pyR7�scCs |iodpdd|iS(NtAllowtDisallows: (R8R(R
((s!/usr/lib/python2.6/robotparser.pyR=�s(R>R?R@RR7R=(((s!/usr/lib/python2.6/robotparser.pyR1�s R)cBs2eZdZd�Zd�Zd�Zd�ZRS(s?An entry has one or more user-agents and zero or more rulelinescCsg|_g|_dS(N(R!R0(R
((s!/usr/lib/python2.6/robotparser.pyR�s cCsjg}x'|iD]}|id|dg�qWx*|iD]}|it|�dg�q:Wdi|�S(NsUser-agent: s
R(R!textendR0R<R;(R
trettagentR((s!/usr/lib/python2.6/robotparser.pyR=�s
cCsa|id�di�}xA|iD]6}|djotS|i�}||jotSq#WtS(s2check if this entry applies to the specified agentR5iR (R+R-R!RR(R
R9RG((s!/usr/lib/python2.6/robotparser.pyR7�s
cCs0x)|iD]}|i|�o|iSq
WtS(sZPreconditions:
- our agent applies to this entry
- filename is URL decoded(R0R7R8R(R
RBR((s!/usr/lib/python2.6/robotparser.pyR8�s
(R>R?R@RR=R7R8(((s!/usr/lib/python2.6/robotparser.pyR)�s
RcBs#eZd�Zd�Zd�ZRS(cGs tii||�d|_dS(Ni�(R.tFancyURLopenerRR(R
targs((s!/usr/lib/python2.6/robotparser.pyR�scCsdS(N(NN(R(R
Rtrealm((s!/usr/lib/python2.6/robotparser.pytprompt_user_passwd�scCs(||_tii||||||�S(N(RR.RHthttp_error_default(R
RtfpRterrmsgtheaders((s!/usr/lib/python2.6/robotparser.pyRL�s (R>R?RRKRL(((s!/usr/lib/python2.6/robotparser.pyR�s (((( R@RR.t__all__RR1R)RHR(((s!/usr/lib/python2.6/robotparser.pyt<module>s �$