파이썬

파이썬 웹크롤링 연습 : 알고리즘 기초 사이트 : get text

mcdn 2020. 8. 24. 17:50
반응형

https://code.plus/course/41

 

알고리즘 기초 1/2

알고리즘 기초

code.plus

요 리스트들을 뽑고 싶다!

요 리스트들을 뽑고 싶다!

 

 

그리고 페이지 소스는 이렇게 생겼다!

 

 

 

import requests
from bs4 import BeautifulSoup
import re

webpage = requests.get("https://code.plus/course/41")
soup = BeautifulSoup(webpage.content, "html.parser")

for x in range(10,30):
    print(soup.select("li")[x].get_text())
C:\Users\user\PycharmProjects\untitled4\venv\Scripts\python.exe C:/Users/user/PycharmProjects/untitled4/next.py
스택
단어 뒤집기
괄호
스택 수열
에디터
큐
조세퍼스 문제
덱
단어 뒤집기 2
쇠막대기
오큰수
오등큰수
후위 표기식2
후위 표기식
알파벳 개수
알파벳 찾기
문자열 분석
단어 길이 재기
ROT13
네 수

Process finished with exit code 0

 

짠 

 

 

import requests
from bs4 import BeautifulSoup
import re

webpage = requests.get("https://code.plus/course/41")
soup = BeautifulSoup(webpage.content, "html.parser")

print(soup.select(".timeline"))
print(soup.select(".timeline")[1].get_text())

.timeline 클래스이기 때문에 앞에 점이 필요하다. 

get_text()함수는 [1]이 없으면 오류가 난다. 

C:\Users\user\PycharmProjects\untitled4\venv\Scripts\python.exe C:/Users/user/PycharmProjects/untitled4/next.py
[<div class="timeline"><div class="timeline-item timeline-item-bordered"><div class="timeline-entry rounded hidden-xs">01<span>챕터</span><div class="timeline-vline hidden-xs"></div></div><h2 class="uppercase bold size-20"><span class="hidden-sm hidden-md hidden-lg">01 </span>알고리즘 시작</h2><div class="chapter-row margin-bottom-10 flex-container">알고리즘 시작<span class="right-pull flex-item weight-normal padding-right-10">00:19:44</span></div></div></div>, <div class="timeline"><div class="timeline-item timeline-item-bordered"><div class="timeline-entry rounded hidden-xs">02<span>챕터</span><div class="timeline-vline hidden-xs"></div></div><h2 class="uppercase bold size-20"><span class="hidden-sm hidden-md hidden-lg">02 </span>자료구조 1</h2><div class="chapter-row margin-bottom-10 flex-container">스택<span class="right-pull flex-item weight-normal padding-right-10">00:38:36</span></div><div class="chapter-row margin-bottom-10 flex-container">큐와 덱<span class="right-pull flex-item weight-normal padding-right-10">00:06:40</span></div><div class="chapter-row margin-bottom-10 flex-container">연습<span class="right-pull flex-item weight-normal padding-right-10">00:17:59</span></div></div></div>, <div class="timeline"><div class="timeline-item timeline-item-bordered"><div class="timeline-entry rounded hidden-xs">03<span>챕터</span><div class="timeline-vline hidden-xs"></div></div><h2 class="uppercase bold size-20"><span class="hidden-sm hidden-md hidden-lg">03 </span>수학 1</h2><div class="chapter-row margin-bottom-10 flex-container">수학 1<span class="right-pull flex-item weight-normal padding-right-10">00:28:42</span></div><div class="chapter-row margin-bottom-10 flex-container">연습<span class="right-pull flex-item weight-normal padding-right-10">00:06:01</span></div></div></div>, <div class="timeline"><div class="timeline-item timeline-item-bordered"><div class="timeline-entry rounded hidden-xs">04<span>챕터</span><div class="timeline-vline hidden-xs"></div></div><h2 class="uppercase bold size-20"><span class="hidden-sm hidden-md hidden-lg">04 </span>다이나믹 프로그래밍 1</h2><div class="chapter-row margin-bottom-10 flex-container">다이나믹 프로그래밍 소개<span class="right-pull flex-item weight-normal padding-right-10">00:22:15</span></div><div class="chapter-row margin-bottom-10 flex-container">1, 2, 3 더하기까지<span class="right-pull flex-item weight-normal padding-right-10">00:25:07</span></div><div class="chapter-row margin-bottom-10 flex-container">이친수까지<span class="right-pull flex-item weight-normal padding-right-10">00:21:17</span></div><div class="chapter-row margin-bottom-10 flex-container">합분해까지<span class="right-pull flex-item weight-normal padding-right-10">00:30:47</span></div><div class="chapter-row margin-bottom-10 flex-container">연습<span class="right-pull flex-item weight-normal padding-right-10">00:30:30</span></div><div class="chapter-row margin-bottom-10 flex-container">도전<span class="right-pull flex-item weight-normal padding-right-10">00:15:37</span></div></div></div>]
02챕터02 자료구조 1스택00:38:36큐와 덱00:06:40연습00:17:59

Process finished with exit code 0

페이지의 요 부분이 저렇게 나옴 

 

 

 

2. 공지사항 홈페이지 
https://code.plus/notice/list/1
 

공지사항 - 1 페이지

코드 플러스는 코딩강의를 합니다

code.plus

info클래스는 저 공지사항 고정 사항들

import requests
from bs4 import BeautifulSoup
import re

webpage = requests.get("https://code.plus/notice/list/1")
soup = BeautifulSoup(webpage.content, "html.parser")

print(soup.select(".info")[1].get_text())
print(soup.select(".info")[4].get_text())

[1]을 하면 두번째것이 프린트 되지만 

[4]를 하면 out of range에러가 난다. 

C:\Users\user\PycharmProjects\untitled4\venv\Scripts\python.exe C:/Users/user/PycharmProjects/untitled4/next.py
알고리즘 강의 챕터, 슬라이드 구성일 년 전일 년 전
Traceback (most recent call last):
  File "C:/Users/user/PycharmProjects/untitled4/next.py", line 9, in <module>
    print(soup.select(".info")[4].get_text())
IndexError: list index out of range

Process finished with exit code 1

 

반응형