Instagram returns "HTTP/1.1 301 Moved Permanently" on each request
to "https://instagram.com/" because the correct location is
"https://www.instagram.com/".
Instagram will respond with "HTTP/1.1 301 Moved Permanently" if the
URI for the requested user doesn't end with a slash.
Notice: This is only a minor enhancement to prevent error 301 from
happening. The previous version worked fine as is.
Adds additional messages to the error log when fetching contents. The
data is helpful in finding issues with receiving contents from servers.
References: #879, #882, #884
Exceptions are reported to users, but they do not necessarily appear
in the error log on the server. Using 'error_log' we can explicitly
write exceptions and error messages to the log file, using the
standard PHP message format.
For more information see https://stackoverflow.com/a/26867035
Using the classical www.zone-telechargement1.org as base URL, the bridge will
always be redirected to the actual wwX.zone-telechargement1.org final URL. This
makes the bridge more robust to URL changes.
This commit collects the original contents from a different
tag to prevent this issue. The root cause is unknown but closely
related to the regex.
References #877
The function 'defaultLinkTo' applied to the source HTML does break
regex matches later in the bridge. We need to apply the function
right before adding the contents to the item for the bridge to work
properly.
References #856
- Base URL updated
- Show name has different styles on the Website, use another way to get the show name
- Entry URIs are now unique to make sure RSS readers don't treat episodes as duplicates
- No more new lines in the feed or item title
- Entry URIs are unique to make sure RSS readers don't treat episodes as duplicates
- Titles are unique to make sure RSS readers don't treat streams and downloads as duplicates
The bridge has been expanded to better cover the whole GQ magazine.
It should support all countries (provided they all use the same absurdly shitty publication system).
It is guaranteed to be only tested with sexactu articles (that I now obtain by loading Maïa Mazaurette author page).
This commit adds a new optional parameter 'limit' which can be used
to limit the number of items returned by this bridge (i.e. '&limit=10')
As requested in #669
* [Exceptions] Don't return header for bridge exceptions
* [Exceptions] Add link to list in exception message
This is an alternative when the button is not rendered
for some reason.
* [index] Don't return bridge exception for formats
* [index] Return feed item for bridge exceptions
* [BridgeAbstract] Rename 'getCacheTime' to 'getModifiedTime'
* [BridgeAbstract] Move caching to index.php to separate concerns
index.php needs more control over caching behavior in order to cache
exceptions. This cannot be done in a bridge, as the bridge might be
broken, thus preventing caching from working.
This also (and more importantly) separates concerns. The bridge should
not even care if caching is involved or not. Its purpose is to collect
and provide data.
Response times should be faster, as more complex bridge functions like
'setDatas' (evaluates all input parameters to predict the current
context) and 'collectData' (collects data from sites) can be skipped
entirely.
Notice: In its current form, index.php takes care of caching. This
could, however, be moved into a separate class (i.e. CacheAbstract)
in order to make implementation details cache specific.
* [index] Add '_error_time' parameter to $item['uri']
This ensures that error messages are recognized by feed readers as
new errors after 24 hours. During that time the same item is returned
no matter how often the cache is cleared.
References https://github.com/RSS-Bridge/rss-bridge/issues/814#issuecomment-420876162
* [index] Include '_error_time' in the title for errors
This prevents feed readers from "updating" feeds based on the title
* [index] Handle "HTTP_IF_MODIFIED_SINCE" client requests
Implementation is based on `BridgeAbstract::dieIfNotModified()`,
introduced in 422c125d8e and
simplified based on https://stackoverflow.com/a/10847262
Basically, before returning cached data we check if the client send
the "HTTP_IF_MODIFIED_SINCE" header. If the modification time is
more recent or equal to the cache time, we reply with "HTTP/1.1 304
Not Modified" (same as before). Otherwise send the cached data.
* [index] Don't encode exception message with `htmlspecialchars`
* [Exceptions] Include error message in exception
* [index] Show different error message for error code 0
The URI "https://facebook.com/username?_fb_noscript=1" returns two
posts per user. Some profiles, however, are very active, causing the
bridge to miss items if more than two posts are send within the cache
duration (5 minutes).
The alternative suggested in #669 is to use a different URI:
"https://facebook.com/pg/username/posts?_fb_noscript=1"
While the contents of this URI essentially look the same when viewed
in a browser, it actually returns more than 10 posts depending on the
profile.
References #669
Adds a new class 'ParameterValidator' to replace the functions from
'validator.php', separating private functions from 'validateData' to
class private functions in the process.
Instead of echoing error messages, adds messages to a private variable,
accessible via 'getInvalidParameters'.
BridgeAbstract now adds invalid parameter names to the error message.
- Use 'array_diff_key' instead of 'unset'
- Remove parameters for caches
By removing certain parameters for caches, the loading times can be
improved considerably:
* action: It doesn't matter which action the user took to generate
feed items.
* format: This has the biggest impact on performance, because cached
items are now shared between different formats (i.e. try switching
between Atom, Html and Mrss and compare previous vs. now). If a
server handles lots of requests, this may even reduce bandwidth if
the same contents are requested for different formats.
* _noproxy: The proxy behavior has no impact on the produced items,
so it can be ignored.
* _cache_timeout: This is another option which might impact performance
for some servers, especially if 'custom_timeout' has been enabled in
the configuration. Requests with different cache timeouts no longer
result in separate cache files.