'Web/crawling' 카테고리의 글 목록

requests 모듈로 스크랩(크롤링 할때) 웹페이지 접속은 되는데, 503 error(에러) 뜰때

2021. 9. 20. 17:10 Web/crawling

웹페이지 접속은 되는데, 503 에러가 계속 뜰때 ex). remoteok.io 그 페이지에서 스크랩 방지를 걸어놓은 것이다 해결방법 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',} r = requests.post(url, headers=headers) r.raise_for_status() def export_remote_jobs(): word = 'python' SEARCH_URL = f"https://remoteok.io/remote-dev+{word}-jobs?hide_sticky=&compact_mode=true&l..

파이썬 beautifulsoup 주요 속성

2021. 9. 8. 18:59 Web/crawling

1. 모든 a 태그 검색 soup.find_all("a") soup("a") 2. string 이 있는 title 태그 모두 검색 soup.title.find_all(string=True) soup.title(string=True) 3. a 태그를 두개만 가져옴 soup.find_all("a", limit=2) 4. string 검색 soup.find_all(string="Elsie") # string 이 Elsie 인 것 찾기 soup.find_all(string=["Tillie", "Elsie", "Lacie"]) # or 검색 soup.find_all(string=re.compile("Dormouse")) # 정규식 이용 5. p 태그와 속성 값이 title 이 있는거 soup.find_all("p..

crawling) 웹페이지 html 로 불러오기

2021. 3. 28. 22:25 Web/crawling

crawling) pdf 파일 생성

2021. 3. 28. 22:17 Web/crawling

티스토리툴바